Whisper STT: re-architect for real-time or defer to post-processing

## Status

Whisper on-device STT is **functional but too slow for production**. Current architecture:
- 3-second audio chunks buffered via ScriptProcessorNode
- Single-threaded WASM inference (~3-5s per chunk)
- Chrome MV3 CSP blocks multi-threaded ONNX (blob: URLs in script-src)
- Result: ~6-8s latency per utterance, missing real-time intent matching

Web Speech API (Google) works well for real-time but requires internet.

## Options

**A: Streaming Whisper** — 500ms chunks, lower accuracy, still WASM-limited
**B: WebGPU backend** — significant speedup, not all extension contexts support it
**C: Post-processing** — Web Speech for real-time, Whisper refines after capture ends
**D: Accept Web Speech as default** — Whisper is opt-in private mode with known latency

**Recommendation:** D for now, C as next evolution.

## Technical debt
- Migrate ScriptProcessorNode to AudioWorklet
- numThreads=1 forced by CSP — revisit if Chrome relaxes MV3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper STT: re-architect for real-time or defer to post-processing #43

Status

Options

Technical debt

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Whisper STT: re-architect for real-time or defer to post-processing #43

Description

Status

Options

Technical debt

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions