Status
Whisper on-device STT is functional but too slow for production. Current architecture:
- 3-second audio chunks buffered via ScriptProcessorNode
- Single-threaded WASM inference (~3-5s per chunk)
- Chrome MV3 CSP blocks multi-threaded ONNX (blob: URLs in script-src)
- Result: ~6-8s latency per utterance, missing real-time intent matching
Web Speech API (Google) works well for real-time but requires internet.
Options
A: Streaming Whisper — 500ms chunks, lower accuracy, still WASM-limited
B: WebGPU backend — significant speedup, not all extension contexts support it
C: Post-processing — Web Speech for real-time, Whisper refines after capture ends
D: Accept Web Speech as default — Whisper is opt-in private mode with known latency
Recommendation: D for now, C as next evolution.
Technical debt
- Migrate ScriptProcessorNode to AudioWorklet
- numThreads=1 forced by CSP — revisit if Chrome relaxes MV3
🤖 Generated with Claude Code
Status
Whisper on-device STT is functional but too slow for production. Current architecture:
Web Speech API (Google) works well for real-time but requires internet.
Options
A: Streaming Whisper — 500ms chunks, lower accuracy, still WASM-limited
B: WebGPU backend — significant speedup, not all extension contexts support it
C: Post-processing — Web Speech for real-time, Whisper refines after capture ends
D: Accept Web Speech as default — Whisper is opt-in private mode with known latency
Recommendation: D for now, C as next evolution.
Technical debt
🤖 Generated with Claude Code