Visit Notes is a small, personal experiment to help patients capture and revisit what they discussed during and after a medical appointment — privately, on their own device. It isn’t a commercial product; it’s a learning project I’m building in public. If you’re curious, I’d be grateful for your feedback.
Inspired by the original Abridge app (now deprecated), this project explores whether on‑device speech recognition and language models can provide a similar, patient‑empowering experience without relying on cloud services. Visit Notes is unaffiliated with Abridge.
Who it’s for
• Patients who want a clear record of what was said in a medical visit (with consent).
• Care partners who help track instructions, medications, and follow‑ups.
• Anyone who benefits from reviewing key points after an appointment.
What Visit Notes does
• Transcribes recordings you make (or import) with on‑device speech‑to‑text.
• Generates short topic lines and concise summaries to highlight the essentials.
• Extracts key details (e.g., follow‑ups, instructions, medications mentioned) into simple, structured text you can scan.
• Works offline after a one‑time model download — your audio and text stay on your device.
How it helps in practice
• During your visit (with consent): Record the conversation so you don’t have to rely on memory alone.
• After your visit: Get a readable transcript, a short summary, and a compact list of key points to review or share with a care partner.
Note: Visit Notes doesn’t offer live clinical decision support; it’s meant to help you remember and review what was said.
Why I’m building it
When the original Abridge app was sunset, I missed having a patient‑centric way to capture conversations with clinicians. Visit Notes is my attempt to learn, prototype, and see what feels genuinely helpful — while keeping everything private and on‑device.
How it works (technical overview)
- On‑device ASR: Whisper‑family model via WhisperKit handles transcription locally (see WhisperKitASRService).
- On‑device LLM: Summaries, topic sentences, and structured extraction are generated by a Phi‑3.5 chat model running through llama.cpp (wrapped with LLM.swift). No cloud inference.
- Streaming and stability: An actor‑isolated LLMWorker manages model lifecycle and streaming generation, with context sizes tuned for short vs. longer outputs. It resets KV‑cache state and staggers GPU reuse to avoid stalls.
- One‑time download: A resumable, integrity‑checked downloader fetches the model once; after that, everything runs offline.
- Observability: LLMStatistics tracks success rate and average inference time to guide ongoing tuning.
Privacy
- Your audio and text are processed on your device.
- No transcripts or prompts are sent to a server for inference.
- The one‑time model file is downloaded securely, then used locally.
- I don’t collect personal data from your notes. Any diagnostics or feedback you share is opt‑in.
Important notes and disclaimers
- Not medical advice: Visit Notes is not a medical device and does not diagnose, treat, cure, or prevent any condition. Always follow guidance from your clinician.
- Get consent: Please obtain consent before recording, and follow applicable laws and clinic policies.
- Verify critical details: Transcription and summaries can be imperfect, especially with overlapping speakers, accents, or specialized terminology. Double‑check anything important.
- Not for emergencies: Do not rely on this app in emergency situations.
Current status
This is an early, experimental build. It may have rough edges and could change direction as I learn. I’m sharing it to gather real‑world feedback.
Known limitations
- Overlapping speakers can reduce accuracy; diarization is limited.
- Highly technical terms or drug names may need manual correction.
- Very long visits may require trimming to keep performance responsive.
- Performance and battery usage vary by device and model size.