An AI-powered application that converts patient–doctor conversations into structured clinical documentation. Record audio or upload a file, and get a speaker-diarized transcript, AI-generated SOAP notes, keyword-highlighted medical terms, and on-demand billing code suggestions — all processed ephemerally with no patient data storage.
- MediScript AI — AI-Powered Clinical Documentation
MediScript AI demonstrates how a two-stage generative AI pipeline — combining a speech-to-text model with a large language model — can convert raw clinical audio into structured, editable medical documentation.
Built as an open-source blueprint under the Cloud2 Labs Innovation Hub, MediScript AI is designed for:
- Healthcare innovation demos — show end-to-end AI clinical documentation in a browser with no infrastructure overhead
- Telemedicine platforms — integrate into post-visit documentation workflows
- Clinical scribing research — evaluate LLM accuracy on SOAP note generation and medical entity extraction
- Containerized deployments — ship directly to any Innovation Hub environment via Docker
The application processes audio entirely in-memory. No patient audio, transcripts, or personally identifiable information is stored at any point.
- The user records audio via the browser microphone or uploads an MP3/WAV file (up to 10 minutes).
- The Next.js frontend sends the audio to
/api/process-audioon the same origin; thin Route Handlers forward the request to the FastAPI backend usingBACKEND_INTERNAL_URLat request time (so Docker runtime env works; rewrites alone would bake URLs in at build time). - The backend forwards the audio to OpenAI Whisper (
whisper-1) withverbose_json, returning a timestamped array of transcript segments. - The segments are passed to GPT-4o, which determines which speaker is the Doctor and which is the Patient, generates a structured SOAP note, and extracts categorized medical keywords.
- The frontend renders the diarized transcript with color-coded keyword highlights and the formatted SOAP notes side by side.
- Optionally, the doctor can click Generate Billing Codes to POST the SOAP notes to
/api/generate-billing, which the Next.js server proxies to FastAPI; GPT-4o suggests CPT and ICD-10 codes. - The doctor can edit the AI-generated notes inline and export everything as TXT or Markdown.
MediScript AI is a two-service monorepo:
frontend/— Next.js (React) UI and static assets. It does not implement AI logic; same-origin/api/*Route Handlers proxy to FastAPI over HTTP usingBACKEND_INTERNAL_URL.backend/— FastAPI (Python) service that owns all OpenAI integration and exposes the application's REST API.
There is no database. All configuration for Docker runs is declared in docker-compose.yml. Services communicate over the Compose network using the backend service hostname (http://backend:8000) from the Next.js server.
graph TB
subgraph "Browser (localhost:3000)"
A[Next.js UI]
A1[Audio Recorder]
A2[File Upload]
A3[Transcript Display]
A4[SOAP Notes Editor]
A5[Export Buttons]
end
subgraph "Docker network"
F[Next.js server :3000]
B[FastAPI :8000]
end
subgraph "OpenAI API"
D[Whisper — whisper-1\nSpeech-to-Text]
E[GPT-4o\nDiarization + SOAP + Keywords]
G[GPT-4o\nCPT + ICD-10 Codes]
end
A1 --> F
A2 --> F
F -->|"proxy fetch → http://backend:8000/api/*"| B
B --> D
D -->|Timestamped Segments| B
B --> E
E -->|Utterances + SOAP + Keywords| B
B --> F
F --> A3
F --> A4
A4 -->|SOAP Notes JSON| F
F -->|proxy| B
B --> G
G -->|CPT + ICD-10| B
B --> F
F --> A4
A4 --> A5
style A fill:#11101B,color:#EDE9FE
style F fill:#242032,color:#EDE9FE
style B fill:#171522,color:#EDE9FE
style D fill:#7C3AED,color:#fff
style E fill:#7C3AED,color:#fff
style G fill:#7C3AED,color:#fff
Frontend (frontend/)
- Dark-mode-first UI built with Tailwind CSS and shadcn/ui components
- Audio recorder using the browser
MediaRecorderAPI with a live MM:SS timer and a 10-minute hard limit - MP3/WAV file upload as an alternative to live recording
- Transcript panel with speaker-labeled, timestamped dialogue and inline keyword highlighting (symptoms in red, medications in blue, diagnoses in purple)
- SOAP notes panel with an inline edit mode (editable Chief Complaint, Symptoms, Assessment, and Recommendation)
- On-demand billing code card displaying CPT and ICD-10 suggestions in badge format
- Export buttons for one-click clipboard copy and TXT/Markdown file downloads
- Next.js Route Handlers proxy all
/api/*requests to FastAPI at runtime usingBACKEND_INTERNAL_URL
Backend (backend/)
POST /api/process-audio— receives multipart audio (audioorfilefield), calls Whisper + GPT-4o, returns{ utterances, soapNotes, keywords }POST /api/generate-billing— receives the SOAP notes JSON body, calls GPT-4o with a medical-coder prompt, returns{ cpt, icd10 }GET /health— liveness check for container orchestration
External Integration
- OpenAI Whisper (
whisper-1) — speech-to-text withverbose_jsonresponse format for timestamped segment arrays - OpenAI GPT-4o — contextual speaker diarization, SOAP note generation, keyword extraction, and billing code suggestion; all calls use
response_format: json_objectfor structured output
| Service | Container | Host Port | Description |
|---|---|---|---|
frontend |
frontend |
3000 |
Next.js UI — serves the app and proxies /api/* to the FastAPI backend |
backend |
backend |
8000 |
FastAPI service — audio handling, Whisper STT, GPT-4o reasoning, billing API |
No third service is required. MediScript AI has no database, no message queue, and no object storage. Both containers communicate directly over the Compose Docker network.
- User records or uploads audio in the browser.
- Browser POSTs
FormDatato/api/process-audioon the same origin. - Next.js Route Handler forwards the request to FastAPI over the Docker network.
- FastAPI calls Whisper; segments are passed to GPT-4o; structured JSON is returned to the frontend.
- User optionally requests billing codes; browser POSTs SOAP notes JSON to
/api/generate-billing; the Route Handler proxies to FastAPI.
Before you begin, ensure you have the following installed and configured:
- Docker and Docker Compose (v2)
- An OpenAI API key — used for both Whisper (STT) and GPT-4o (reasoning)
For local development without Docker:
- Node.js 20+
- Python 3.12+
- npm
node --version
npm --version
python3 --version
docker --version
docker compose versiongit clone https://github.com/cld2labs/MediScriptAI.git
cd MediScriptAISet your OpenAI API key in the shell before running Compose. All container env vars are declared in docker-compose.yml — do not create .env files inside frontend/ or backend/:
export OPENAI_API_KEY="sk-your-openai-api-key-here"# Standard (attached — logs stream to terminal)
docker compose up --build
# Detached (background)
docker compose up -d --buildIf you renamed services from a previous version, remove old containers first:
docker compose down --remove-orphansOnce both containers are running:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs (Swagger): http://localhost:8000/docs
Important: Always use the hostname
localhost, not127.0.0.1or a LAN IP address. Browsers block microphone access on non-HTTPS origins —localhostis the only exception.
# Backend health check
curl http://localhost:8000/health
# View running containers
docker compose psView logs:
# All services
docker compose logs -f
# Backend only
docker compose logs -f backend
# Frontend only
docker compose logs -f frontenddocker compose downRun the backend and frontend in two separate terminals. Export all variables in your shell rather than relying on .env files in service subfolders.
Terminal 1 — FastAPI backend
cd backend
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
export OPENAI_API_KEY="sk-..."
export PORT=8000
./scripts/dev.sh
# or: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadTerminal 2 — Next.js frontend
cd frontend
npm install
export BACKEND_INTERNAL_URL="http://127.0.0.1:8000"
export NEXT_PUBLIC_API_BASE_URL=""
npm run devOpen http://localhost:3000. The Next.js dev server proxies all /api/* requests to FastAPI via BACKEND_INTERNAL_URL.
Production-style (no hot reload):
# Backend
cd backend && ./scripts/start.sh
# Frontend (after build)
cd frontend && npm run build && npm run startMediScriptAI/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app entry point + /health route
│ │ ├── config.py # os.environ helpers and defaults
│ │ ├── prompts.py # GPT-4o system prompts (diarization, billing)
│ │ └── routers/
│ │ ├── process_audio.py # POST /api/process-audio (Whisper + GPT-4o)
│ │ └── generate_billing.py # POST /api/generate-billing (GPT-4o coder)
│ ├── scripts/
│ │ ├── dev.sh # uvicorn --reload for local dev
│ │ └── start.sh # uvicorn production (no reload)
│ ├── Dockerfile
│ ├── requirements.txt
│ └── .dockerignore
├── frontend/
│ ├── app/
│ │ ├── api/
│ │ │ ├── process-audio/
│ │ │ │ └── route.ts # Route Handler — proxies to FastAPI at runtime
│ │ │ └── generate-billing/
│ │ │ └── route.ts # Route Handler — proxies to FastAPI at runtime
│ │ ├── globals.css # Tailwind directives + shadcn CSS variables
│ │ ├── layout.tsx # Root layout — font, metadata, Toaster
│ │ └── page.tsx # Main page — state, layout, data flow
│ ├── components/
│ │ ├── ui/ # shadcn/ui base components
│ │ ├── AudioInput.tsx # Recorder + file upload + Process button
│ │ ├── TranscriptDisplay.tsx # Speaker labels, timestamps, keyword highlights
│ │ ├── SoapNotesDisplay.tsx # SOAP viewer + inline edit mode
│ │ └── ExportButtons.tsx # Copy, Download TXT, Download Markdown
│ ├── lib/
│ │ ├── apiConfig.ts # NEXT_PUBLIC_API_BASE_URL helper
│ │ └── utils.ts # Tailwind class merge utility (cn)
│ ├── public/
│ │ └── InnovationHub-HeaderImage.png
│ ├── Dockerfile
│ ├── next.config.ts # standalone output mode
│ ├── package.json
│ └── tsconfig.json
├── docker-compose.yml # Single source of container env vars (see file header)
├── .env.example # Reference for local shell exports
└── README.md
Recording a conversation
- Open the application at http://localhost:3000.
- In the left panel, click Start Recording and grant microphone access when prompted by the browser.
- Speak the patient–doctor dialogue. The live timer counts up in MM:SS format.
- Click Stop Recording. The audio is ready and a Process with AI button appears.
- Click Process with AI and wait for the pipeline to complete (typically 15–45 seconds).
Uploading an audio file
- Switch to the Upload tab in the left panel.
- Select or drag in an MP3 or WAV file (maximum 10 minutes of audio).
- Click Process with AI.
Reading the results
- The left panel shows the diarized transcript. Each line is formatted as
[MM:SS] Speaker: text. Medical terms are highlighted inline — symptoms in red, medications in blue, diagnoses in purple. - The right panel shows the AI-generated SOAP note broken into four sections: Chief Complaint, Symptoms, Assessment, and Recommendation.
Editing SOAP notes
- Click the pencil icon in the top-right corner of the SOAP notes card.
- Edit any field directly. Symptoms are presented as a multi-line text area (one symptom per line).
- Click Save changes. The export buttons will reflect your edits.
Generating billing codes
- After SOAP notes are generated, click ✨ Generate Billing Codes (CPT & ICD-10) below the notes card.
- The app sends only the SOAP note text to GPT-4o — not the audio file.
- Within a few seconds, a card appears with suggested CPT procedure codes and ICD-10 diagnosis codes, each with a short description.
Exporting
- Copy to Clipboard — copies the full transcript and SOAP notes as plain text.
- Download TXT — downloads
mediscript-notes.txtwith the transcript, SOAP notes, and billing codes (if generated). - Download Markdown — downloads
mediscript-notes.mdwith full Markdown formatting.
- Record in a quiet environment. Whisper accuracy degrades significantly with background noise, overlapping speech, or multiple speakers talking at the same time. A dedicated room or headset microphone produces the most accurate transcripts.
- Pause briefly between speakers. Whisper segments audio by timestamp, not by speaker channel. A natural 0.5–1 second pause between the doctor and patient speaking helps the model produce cleaner segment boundaries, which in turn gives GPT-4o a stronger signal for role assignment.
- Use clear clinical language. GPT-4o assigns Doctor/Patient roles based on dialogue context. Full sentences with clinical terminology — diagnoses, medication names, specific procedures — give the model the strongest signal. Heavily abbreviated or informal conversation may produce less reliable diarization.
- Keep recordings under 5 minutes for fastest results. Processing time scales with audio length. A 2-minute recording typically completes in under 20 seconds; a 10-minute recording may take 60–90 seconds end-to-end.
- Upload MP3 over WAV when possible. MP3 files are significantly smaller than WAV at equivalent quality, which reduces upload time to OpenAI's API, especially on slower connections.
- Generate billing codes as a separate step. The
/api/generate-billingroute sends only the SOAP notes text — not the audio — to GPT-4o. It is fast and inexpensive to call on demand and does not need to be generated upfront if the doctor may not require it.
The table below compares inference performance across providers and deployment modes using a standardized MediScript AI workload: billing LLM calls equivalent to POST /api/generate-billing (SOAP JSON → CPT/ICD), driven by POST /benchmark with benchmarks/default_inputs.json, averaged over 3 runs. Throughput is approximated as 1000 / P50_latency_ms sequential requests per second (single-stream).
| Provider | Model | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/s) | Hardware |
|---|---|---|---|---|---|---|---|---|---|---|
| Ollama | qwen2.5:3b |
Local (Docker) | 8K† | 235.33 | 165.33 | 400.67 | 84,410 | 201,070 | 0.0118 | CPU-only (Docker Desktop Linux VM) |
| OpenAI (Cloud) | gpt-4o |
API (Cloud) | 128K | 444.84 | 307.67 | 752.5 | 6,090 | 10,590 | 0.1645 | Cloud (OpenAI-managed) |
† 8K is the effective context cap observed for this Ollama tag in Docker during the run; it is not necessarily the largest context the Qwen2.5 family supports on other runtimes.
Notes:
- MediScript harness: Same three SOAP cases in
benchmarks/default_inputs.jsonper run; metrics are aggregate averages across the 3 runs. Token totals may differ slightly between runs because of non-deterministic completions. - Ollama in Docker is CPU-only inside the typical Linux VM (no Metal). Bare-metal Ollama on macOS can use Metal (MPS); expect different latencies than the Docker row above.
A 3B-parameter open-weight instruct model from Alibaba’s Qwen2.5 line, pulled in Docker as qwen2.5:3b for on-device and on-prem chat (billing JSON and structured clinical output) when VLLM_CHAT_URL targets Ollama.
| Attribute | Details |
|---|---|
| Parameters | ~3B (Instruct; exact non-embedding split per model card on Hugging Face) |
| Architecture | Dense decoder-only Transformer (Instruct-tuned); runtime via Ollama GGUF bundles |
| Context window | Native context per Qwen2.5 card (often 32K+); effective window in MediScript is whatever Ollama exposes for the tag (commonly 8K–32K in practice) |
| Reasoning mode | Standard instruct completion (no separate “thinking” channel in MediScript integration) |
| Tool / function calling | Supported in general on some Qwen builds; MediScript uses JSON-structured chat responses for clinical payloads |
| Structured output | json_object for billing, SOAP, utterances, and keywords — validate outputs in production |
| Multilingual | Strong multilingual coverage (see Qwen2.5 model card) |
| Clinical use | Suitable for local / air-gapped experimentation; medical quality and JSON adherence may trail gpt-4o — human review required |
| Quantization formats | As packaged by Ollama (GGUF-style); other formats on Hugging Face for vLLM / llama.cpp workflows |
| Inference runtimes | Ollama, vLLM, llama.cpp, LM Studio, etc. |
| Fine-tuning | Community and full fine-tunes possible on Qwen2.5 checkpoints (see Hugging Face) |
| License | Apache 2.0 (verify per exact checkpoint card) |
| Deployment | Local, on-prem, air-gapped — full data sovereignty when weights stay in your network |
| MediScript default | docker compose -f docker-compose.yml -f docker-compose.qwen.yml with VLLM_CHAT_MODEL=qwen2.5:3b |
OpenAI’s flagship multimodal model, used in MediScript for contextual diarization (Doctor/Patient labels from Whisper segments), SOAP and keywords, and billing JSON (json_object). Default when VLLM_CHAT_URL is unset.
| Attribute | Details |
|---|---|
| Parameters | Not publicly disclosed |
| Architecture | Multimodal Transformer (text + image input, text output) |
| Context window | 128,000 tokens input / 16,384 tokens max output |
| Reasoning mode | Standard inference (no explicit chain-of-thought toggle in MediScript) |
| Tool / function calling | Supported; parallel function calling available in the API |
| Structured output | JSON mode and strict JSON schema adherence supported (json_object for clinical outputs) |
| Multilingual | Broad multilingual support |
| Medical / coding | Strong general clinical and CPT/ICD-style suggestions; all outputs require expert review |
| Pricing | $2.50 / 1M input tokens, $10.00 / 1M output tokens (see OpenAI pricing for current rates) |
| Fine-tuning | Supervised fine-tuning via OpenAI API |
| License | Proprietary (OpenAI Terms of Use) |
| Deployment | Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
| Knowledge cutoff | April 2024 (per OpenAI model card) |
| Configurable via | OPENAI_CHAT_MODEL |
| Capability | Qwen2.5-3B (qwen2.5:3b) |
GPT-4o |
|---|---|---|
| Structured clinical JSON (SOAP, utterances, keywords, billing) | Yes (local inference) | Yes (cloud) |
| Function / tool calling | Model-dependent; MediScript uses JSON chat | Yes (API capability) |
| JSON structured output | Yes (json_object) |
Yes (json_object) |
| On-prem / air-gapped deployment | Yes | No |
| Data sovereignty | Full (weights run locally) | No (payloads sent to OpenAI API) |
| Open weights | Yes (Apache 2.0 family) | No (proprietary) |
| Custom fine-tuning | Full fine-tuning and adapters on open checkpoints | Supervised fine-tuning via OpenAI API only |
| Quantization for edge devices | GGUF / Ollama bundles; other formats via HF | N/A |
| Multimodal (image input) | Not used in MediScript UI | Yes (API capability) |
| Native context window (typical card) | 32K+ (runtime cap may be lower in Ollama) | 128K |
Both models support structured clinical JSON suitable for MediScript’s pipeline. However, only Qwen2.5 (local) offers open weights, data sovereignty, and on-prem flexibility — making it appropriate for air-gapped, regulated, or cost-sensitive environments when you accept higher latency on CPU and stricter validation. GPT-4o offers lower latency and often stronger JSON reliability via OpenAI’s cloud, with multimodal options for future UI work.
All providers are configured via environment variables declared in docker-compose.yml (or exported in your shell for local dev). Use OPENAI_API_KEY with unset VLLM_* URLs for cloud OpenAI, or set VLLM_CHAT_URL / VLLM_STT_URL to OpenAI-compatible endpoints for local or third-party hosts.
Default. Set OPENAI_API_KEY. Chat uses OPENAI_CHAT_MODEL (default gpt-4o); speech-to-text uses OPENAI_WHISPER_MODEL (default whisper-1) when VLLM_STT_URL is unset. No base_url override — traffic goes to OpenAI’s cloud API.
Local chat (and optional alignment with CodeTrans-style stacks): set VLLM_CHAT_URL to your Ollama OpenAI-compatible endpoint (e.g. http://ollama:11434/v1 in Compose) and VLLM_CHAT_MODEL to the tag (e.g. qwen2.5:3b). Use docker-compose.qwen.yml as a reference overlay. VLLM_API_KEY may be set to any non-empty string if the server expects a Bearer token.
Not wired by default. Groq exposes an OpenAI-compatible HTTP API; you could point VLLM_CHAT_URL (or a forked client) at Groq’s base URL and set VLLM_API_KEY to a Groq API key. This is not tested or documented as a first-class path in this repo—validate latency and json_object behavior before production use.
Not wired by default. Same pattern as Groq: any OpenAI-compatible gateway can theoretically be used by configuring base_url + API key for the chat client. MediScript does not ship OpenRouter-specific env keys; use VLLM_CHAT_URL / VLLM_API_KEY if you adapt the backend.
Use VLLM_CHAT_URL for chat completions and VLLM_STT_URL for /v1/audio/transcriptions (e.g. local vLLM, faster-whisper sidecar, or enterprise inference). Set VLLM_CHAT_MODEL / VLLM_STT_MODEL to the model ids your server expects. See docker-compose.whisper-hf.yml for a local STT example.
- Cloud OpenAI only (default) — Set your API key and leave local inference URLs unset so the OpenAI SDK uses the public API for both chat and STT.
export OPENAI_API_KEY=sk-...
# Do not set VLLM_CHAT_URL or VLLM_STT_URL- Local chat + cloud Whisper — Point chat at an OpenAI-compatible server (e.g. Ollama) while
OPENAI_API_KEYstill satisfies cloud STT.
export OPENAI_API_KEY=sk-...
export VLLM_CHAT_URL=http://ollama:11434/v1
export VLLM_CHAT_MODEL=qwen2.5:3b
export VLLM_API_KEY=ollama- Local STT + cloud chat — Use
VLLM_STT_URL(seedocker-compose.whisper-hf.yml) and keepOPENAI_API_KEYfor GPT chat.
export OPENAI_API_KEY=sk-...
export VLLM_STT_URL=http://stt-whisper-hf:8002/v1
export VLLM_STT_MODEL=Systran/faster-whisper-large-v3- All local — Set both
VLLM_CHAT_URLandVLLM_STT_URL(and models /VLLM_API_KEYas required). KeepOPENAI_API_KEYonly if some call path still hits OpenAI.
export VLLM_CHAT_URL=http://ollama:11434/v1
export VLLM_CHAT_MODEL=qwen2.5:3b
export VLLM_STT_URL=http://stt-whisper-hf:8002/v1
export VLLM_STT_MODEL=Systran/faster-whisper-large-v3
export VLLM_API_KEY=ollamaAfter changing provider variables, restart the backend (docker compose up -d or docker compose restart backend).
Model names: override OPENAI_CHAT_MODEL / OPENAI_WHISPER_MODEL for OpenAI cloud; use VLLM_CHAT_MODEL / VLLM_STT_MODEL for OpenAI-compatible servers.
Docker: Every variable injected into containers is listed and documented in docker-compose.yml. Set secrets in your shell before docker compose up:
export OPENAI_API_KEY=sk-...Local dev: Export the same variables in your shell (see .env.example for a reference checklist). The app does not auto-load .env files from inside frontend/ or backend/.
| Variable | Service | Description | Default | Type |
|---|---|---|---|---|
OPENAI_API_KEY |
backend | OpenAI API key — used for both Whisper and GPT-4o | — | string |
OPENAI_WHISPER_MODEL |
backend | Whisper model identifier for speech-to-text | whisper-1 |
string |
OPENAI_CHAT_MODEL |
backend | Chat model for diarization, SOAP generation, and billing | gpt-4o |
string |
VLLM_CHAT_URL |
backend | Optional OpenAI-compatible chat base URL (ends in /v1); unset = OpenAI cloud |
— | string |
VLLM_STT_URL |
backend | Optional OpenAI-compatible STT base URL for /v1/audio/transcriptions |
— | string |
VLLM_CHAT_MODEL |
backend | Model id for local chat server | Qwen/Qwen2.5-7B-Instruct |
string |
VLLM_STT_MODEL |
backend | Model id for local STT server | openai/whisper-large-v3 |
string |
VLLM_API_KEY |
backend | Bearer token for vLLM-compatible servers (often EMPTY) |
— | string |
This blueprint does not expose sampling or output caps as environment variables; structured clinical responses use response_format: { "type": "json_object" } in code. The table below mirrors common knobs for structural parity with other Innovation Hub READMEs—here they are not defined unless you add them.
| Variable | Service | Description | Default | Type |
|---|---|---|---|---|
LLM_TEMPERATURE |
backend | Not defined. Set in application code where completions are invoked, or add an env-backed option in app/config.py. |
— | float |
LLM_MAX_TOKENS |
backend | Not defined. No env cap on completion length; extend backend/app/services/ if needed. |
— | integer |
MAX_CODE_LENGTH |
backend | Not defined. No codegen-style length limit env var; billing uses structured JSON from the LLM. | — | integer |
Model choice (context, cost, latency) is controlled by OPENAI_CHAT_MODEL, OPENAI_WHISPER_MODEL, VLLM_CHAT_MODEL, and VLLM_STT_MODEL (see Core LLM Configuration).
The API is stateless: no server-side session store and no user accounts in the default stack. Each request carries its own payload; responses are built in memory and not persisted by the app.
| Variable | Service | Description | Default | Type |
|---|---|---|---|---|
CORS_ALLOW_ORIGINS |
backend | Not used in this blueprint. The browser calls Next.js /api/* only; FastAPI is reached server-side via BACKEND_INTERNAL_URL, so same-origin CORS to the backend is unnecessary. Add CORS middleware + your own env if you expose FastAPI directly to browsers. |
— | string |
Audio file uploads and recordings are bounded by backend limits (same knobs the UI enforces for duration and size):
| Variable | Service | Description | Default | Type |
|---|---|---|---|---|
MAX_AUDIO_MINUTES |
backend | Maximum audio duration accepted (minutes); applies to recorder and uploaded files | 10 |
integer |
MAX_FILE_SIZE_MB |
backend | Maximum upload size in megabytes (multipart audio) | 25 |
integer |
| Variable | Service | Description | Default | Type |
|---|---|---|---|---|
PORT |
both | Listen port inside the container | 8000 / 3000 |
integer |
BACKEND_INTERNAL_URL |
frontend | Base URL for proxying /api/* from the Next.js server to FastAPI |
http://backend:8000 |
string |
NEXT_PUBLIC_API_BASE_URL |
frontend | Optional browser-side API base. Set to "" for same-origin paths (recommended) |
"" |
string |
NODE_ENV |
frontend | Node environment (production in containers) |
production |
string |
NEXT_TELEMETRY_DISABLED |
frontend | Set to 1 to disable Next.js anonymous telemetry |
1 |
integer |
- Framework: FastAPI (Python 3.12+) with Uvicorn ASGI server
- STT Integration: OpenAI
whisper-1via theopenaiPython SDK —verbose_jsonformat returns timestamped segment arrays - LLM Integration: OpenAI
gpt-4ovia theopenaiPython SDK —json_objectresponse format for all structured output - Config Management:
os.environhelpers inapp/config.py— no additional env file library required in containers - Data Validation: Pydantic v2 for request/response schema enforcement
- Framework: Next.js (App Router) with React and TypeScript
- Styling: Tailwind CSS with a dark-mode-first custom color palette (deep purple
#8B5CF6primary, dark surface#05030Abackground) - Component Library: shadcn/ui (Radix UI primitives + Tailwind variants)
- Icons: Lucide React
- Font: Geist Sans (via
next/font) - API Proxy: Next.js Route Handlers proxy all
/api/*requests to FastAPI at runtime viaBACKEND_INTERNAL_URL - Production Build:
output: standaloneinnext.config.tsfor minimal Docker image size
For common issues and solutions, see below. For deeper investigation, use Debug Mode.
Issue: Microphone is blocked or not working
- Confirm you are accessing the app at exactly
http://localhost:3000, nothttp://127.0.0.1:3000or a LAN IP. Browsers enforce HTTPS for microphone access on all origins except the literal hostnamelocalhost. - Open browser DevTools → Console and look for a
NotAllowedError: Permission deniedmessage. If present, go to your browser's site settings and manually grant microphone access forlocalhost.
Issue: "Process with AI" returns an error or shows nothing
# Check backend logs for error details
docker compose logs -f backend
# Confirm the API key was injected at runtime
docker inspect $(docker compose ps -q backend) | grep OPENAI- Ensure
OPENAI_API_KEYwas exported in your shell before runningdocker compose up. - For local dev, restart both processes after changing env — Next.js and uvicorn only read environment variables at startup.
- Verify the key is valid and has sufficient quota at platform.openai.com/usage.
Issue: Frontend cannot reach the backend
# Verify both containers are running
docker compose ps
# Test backend directly
curl http://localhost:8000/health- Confirm
BACKEND_INTERNAL_URLindocker-compose.ymlishttp://backend:8000(Compose DNS name, notlocalhost). - Confirm both containers are on the same Docker Compose network.
Issue: Processing takes very long or times out
- Whisper transcription time scales with audio file size. A 10-minute recording may take 30–50 seconds end-to-end.
- Check your upload bandwidth — the raw audio file is sent to OpenAI's servers.
- Check status.openai.com for active incidents.
Issue: Speaker diarization is inaccurate
- GPT-4o assigns speaker roles from dialogue context alone. Very short recordings or conversations with minimal clinical language may produce unreliable results.
- Record longer exchanges with clear clinical language — the doctor asking diagnostic questions and the patient describing specific symptoms produces the strongest contextual signal.
- Roles can always be corrected manually using the inline SOAP notes edit mode.
Issue: Docker build fails
# Rebuild from scratch with no cache
docker compose build --no-cache- Confirm
output: "standalone"is present infrontend/next.config.ts. Without it, the multi-stage frontend Dockerfile cannot locate the standalone server file. - Ensure Docker Desktop has at least 2 GB of memory allocated for the Next.js build step.
Issue: Billing codes are too generic
- GPT-4o generates codes from SOAP note text only. Vague or very short notes produce less specific codes.
- Edit the SOAP notes to include specific procedures, medication names, and diagnoses before clicking Generate Billing Codes — richer context produces more precise suggestions.
Enable verbose logging on the backend for deeper inspection:
# Local dev — start uvicorn with debug log level
cd backend
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload --log-level debugOr stream real-time logs from running Docker containers:
# Backend
docker compose logs -f backend
# Frontend (Next.js server-side logs)
docker compose logs -f frontend
# All services
docker compose logs -fThis project is licensed under our LICENSE file for details.
MediScript AI is provided as-is for demonstration and educational purposes as part of the Cloud2 Labs Innovation Hub.
- This application is not a certified clinical documentation system and should not be used for medical decision-making.
- AI-generated SOAP notes, transcripts, and billing code suggestions must be reviewed by a qualified clinician or medical coder before use in any real patient care or billing context.
- No patient audio, transcripts, or personally identifiable information is stored by this application. However, audio data is transmitted to OpenAI's API for processing — review OpenAI's data usage policies before processing real patient conversations.
- CPT and ICD-10 code suggestions are illustrative only. Submitting incorrect billing codes carries significant compliance and legal risk. Do not use AI-generated codes without expert review.
For full disclaimer details, see DISCLAIMER.md.
