Skip to content

feat: add MLX Audio batch transcription backend#325

Closed
calebchongc wants to merge 1 commit into
silverstein:mainfrom
calebchongc:mlx-audio-batch
Closed

feat: add MLX Audio batch transcription backend#325
calebchongc wants to merge 1 commit into
silverstein:mainfrom
calebchongc:mlx-audio-batch

Conversation

@calebchongc

Copy link
Copy Markdown
Contributor

Summary

  • Add transcription.engine = "mlx-audio" as a config-driven saved-audio transcription backend for final meeting/memo transcripts.
  • Add minutes setup --mlx-audio --mlx-audio-model <id> to create/use a local Python env, install mlx-audio, save config, and run a lightweight readiness check.
  • Add a warm JSONL helper for MLX Audio, reuse the timestamped transcript formatter, and keep batch transcripts strict about timed segments.
  • Make ASR diagnostics engine-neutral and include the selected transcription engine in pipeline logs.
  • Document config-only desktop usage, the timestamp contract, and gated real-model/reference benchmark tests.

Scope

  • Saved-audio/batch processing only: minutes process, desktop post-recording processing, and meeting/memo processing.
  • No live transcript or dictation backend changes in this PR.
  • No desktop Settings dropdown changes; users configure MLX via minutes setup or Advanced -> Open config.
  • No speakrs/diarization changes.

Test plan

  • cargo check -p minutes-core
  • cargo test -p minutes-core mlx_audio --lib
  • cargo test -p minutes-core diagnosis_uses_engine_agnostic_segment_label --lib
  • cargo test -p minutes-cli mlx_audio
  • git diff --check

@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the evil genius laboratory Team on Vercel.

A member of the Team first needs to authorize it.

@calebchongc

Copy link
Copy Markdown
Contributor Author

Closing this PR because this review stack should target my fork first, not the original upstream repository.

@silverstein

Copy link
Copy Markdown
Owner

No problem on the close, and please do come back with it: this is one of the strongest contributions this repo has received. Zero new Rust dependencies, real scope discipline (batch only, with live and Settings UI explicitly deferred), honest docs, and the engine-neutral diagnostics refactor is valuable on its own.

So you have the direction signal while you work on your fork: I'd take this as an opt-in, explicitly experimental engine, with you as the owner of the MLX bridge as mlx-audio evolves. Minutes' default path stays Python-free, which means the one thing I care most about is the setup flow's failure modes. Most Minutes users have never seen a venv, so minutes setup --mlx-audio should be excellent at the unhappy paths: missing or ancient python3, pip install failures, a venv that got half-created. Clear error, clear fix, never a Python traceback.

Two mechanical notes for the re-submission: rebase on main first (transcribe.rs changed today: the whisper model now loads once per file and the chunk dispatch moved around, which touches the same area as your engine plumbing), and run node scripts/sync_site_release_version.mjs after adding tests so the Site Release Link Consistency check passes.

If it helps to discuss design before the next PR, open an issue and tag me. Looking forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants