A CLI tool that summarizes audio, text files, or entire directories, records from input devices, or generates personalized tech podcasts. Supports multi-language output. Runs locally by default with Ollama, or connect to cloud LLMs (OpenAI, Anthropic, etc.) via LiteLLM.
- Python 3.10+
- Ollama for local models, or an API key for a cloud provider (OpenAI, Anthropic, etc.)
- ffmpeg
pip install -r requirements.txt
# For local models (default):
ollama pull llama3.1:8b
# Or use a cloud model — set your API key:
export OPENAI_API_KEY=sk-...
# or
export ANTHROPIC_API_KEY=sk-ant-...Config files (config.yaml, podcast_config.yaml) and output directories (output/, data/) are resolved relative to the project root, so the tool works correctly regardless of which directory you run it from.
python src/main.py meeting.mp3
python src/main.py meeting.mp3 --hint "team standup meeting"Output: output/meeting/transcript.md, summary.md
Use --hint to tell the summarizer what kind of audio it is (e.g., "team meeting", "university lecture", "interview"). This is a short label that guides tone and structure — it doesn't add domain knowledge (use --kb for that).
Summarize any supported text file (.md, .pdf, .docx, .txt, .html, .csv):
python src/main.py --summarize notes.pdf
python src/main.py --summarize report.md --output-language nlSummarize an entire directory of files into one combined summary:
python src/main.py --summarize ./meeting_notes/Or produce one summary per file with --per-file:
python src/main.py --summarize ./meeting_notes/ --per-fileOutput: output/<name>/summary.md (combined) or output/<name>/summary_<filename>.md (per-file)
python src/main.py --podcast input.md
python src/main.py --podcast ./notes/ --output-language frGenerates a podcast script from input text and converts it to audio. Can optionally enrich with articles from RSS feeds and web search.
Output: output/podcast_<name>/podcast.wav, script.md, sources.md
python src/main.py --record
python src/main.py --record --record-name "standup" --output-dir ./recordingsRecords audio from a selected input device and saves it as a WAV file. The recording is standalone — it does not auto-start transcription or summarization. To summarize a recording afterwards, pass the file to the summarizer:
python src/main.py output/recordings/standup.wavTo record audio from meeting apps (Teams, Zoom, Google Meet), you need to capture system audio. This requires platform-specific setup:
macOS — Install BlackHole virtual audio driver:
brew install blackhole-2chThen open Audio MIDI Setup, click + to create a Multi-Output Device that includes both your speakers and BlackHole. Set this as your system output. When recording, select the BlackHole device. For more info, take a look here: https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device
Windows — Enable Stereo Mix in Sound settings (disabled by default on most systems), or install VB-Cable virtual audio device. Select it as the input device when recording.
Linux — PulseAudio monitor sources are usually available by default. Select the monitor source for your output device (e.g. Monitor of Built-in Audio).
AUDIO_FILE Path to audio file (optional)
--summarize PATH Summarize a text file or directory (md, pdf, docx, txt, ...)
--per-file When summarizing a directory, produce one summary per file
--podcast PATH Generate a podcast from a file or directory of text
--record Record audio from an input device
--record-name Optional name for the recording file
--hint Short label for audio type (e.g., 'team meeting', 'lecture') — guides tone
--kb Directory of reference docs for domain-aware summaries (RAG)
--kb-rebuild Force re-index the knowledge base
--embedding-model Fastembed model for KB embeddings (default: BAAI/bge-small-en-v1.5)
--model Whisper model size (default: medium)
--output-dir Output directory (default: output/<name>/)
--llm-model LLM model — Ollama, OpenAI (gpt-*), Anthropic (claude-*). Default from config.yaml
--input-language Audio language for Whisper transcription (e.g. auto, en, nl, de, fr, ja, zh). Default: auto
--output-language Output language for summaries and podcasts (e.g. en, nl, de, fr). Default from config or en
--chunk-minutes Chunk size in minutes (default: 10)
On Apple Silicon Macs, the tool automatically uses mlx-whisper for GPU-accelerated transcription via Apple's MLX framework. This is significantly faster than the CPU-based faster-whisper backend.
- Automatic: If
mlx-whisperis installed and you're on macOS, it's used by default - Override: Set
WHISPER_BACKEND=faster-whisperto force CPU, orWHISPER_BACKEND=mlxto force MLX - Models are downloaded automatically from HuggingFace on first use
| CLI Model | MLX HuggingFace Repo |
|---|---|
tiny |
mlx-community/whisper-tiny |
base |
mlx-community/whisper-base |
small |
mlx-community/whisper-small |
medium |
mlx-community/whisper-medium |
large-v2 |
mlx-community/whisper-large-v2 |
large-v3 |
mlx-community/whisper-large-v3-turbo |
Add a --kb flag pointing to a directory of reference documents to make summaries and podcasts more domain-aware:
python src/main.py meeting.mp3 --kb ./my_docs/
python src/main.py --podcast --kb ./my_docs/KB content is used as background reference during summarization — it helps the LLM understand domain-specific terms, acronyms, and context, but doesn't steer or add to the output. Only chunks that score above a relevance threshold are included.
Supported formats: .txt, .md, .pdf, .docx, .html, .csv
On first run, documents are chunked, embedded, and stored in a local Qdrant vector store (data/kb_store/). Subsequent runs reuse the cached index. Use --kb-rebuild to re-index when files change:
python src/main.py meeting.mp3 --kb ./my_docs/ --kb-rebuildBy default the KB uses BAAI/bge-small-en-v1.5 (~130 MB, 384 dimensions). For better retrieval quality, use a larger model:
python src/main.py meeting.mp3 --kb ./my_docs/ --embedding-model BAAI/bge-base-en-v1.5
python src/main.py meeting.mp3 --kb ./my_docs/ --embedding-model BAAI/bge-large-en-v1.5Popular fastembed models (downloaded automatically on first use):
| Model | Size | Dimensions |
|---|---|---|
BAAI/bge-small-en-v1.5 (default) |
~130 MB | 384 |
BAAI/bge-base-en-v1.5 |
~440 MB | 768 |
BAAI/bge-large-en-v1.5 |
~1.2 GB | 1024 |
sentence-transformers/all-MiniLM-L6-v2 |
~90 MB | 384 |
nomic-ai/nomic-embed-text-v1.5 |
~560 MB | 768 |
Changing the embedding model requires re-indexing. The tool will detect the mismatch and ask you to add --kb-rebuild.
Every time you run the tool, it asks which AI model you want to use. Your choice is saved to config.yaml so it becomes the default next time.
First run: The tool will walk you through picking a model:
- Choose between local (runs on your machine via Ollama) or cloud (OpenAI, Anthropic, etc.)
- For local models, it checks your hardware (RAM, GPU) and suggests the best model for your machine
- For cloud models, it asks for your API key once — then remembers it
Subsequent runs: The tool uses your last chosen model as the default. Just hit Enter to keep it, or pick a different one.
Your output language preference is also saved here (as output_language). Set it once with --output-language nl and it becomes the default for future runs.
Skip the prompt entirely by passing a model on the command line:
python src/main.py meeting.mp3 --llm-model gpt-4o
python src/main.py meeting.mp3 --llm-model claude-sonnet-4-6
python src/main.py meeting.mp3 --llm-model mistral:7bControls podcast generation. Edit to change voice, style, or sources:
tts:
engine: piper # piper | macos_say
voice: en_US-lessac-medium # Piper model name (or macOS voice name)
voice_host2: en_US-ryan-medium # Second voice for two_host mode
speed: 1.0
podcast:
style: solo # solo | two_host
max_articles: 5
target_length: medium # short (~3min) | medium (~7min) | long (~15min)
sources:
feeds:
- https://hnrss.org/newest?points=100
- https://feeds.arstechnica.com/arstechnica/technology-lab
- https://arxiv.org/rss/cs.AI
web_search: trueWhen using Piper, you need to download voice model files (.onnx + .onnx.json) and place them in the voices/ directory. Browse available voices at https://github.com/rhasspy/piper/blob/master/VOICES.md.
mkdir -p voices && cd voices
python3 -m piper.download_voices en_US-ryan-medium
python3 -m piper.download_voices en_US-lessac-mediumThe voice value in podcast_config.yaml must match the filename without .onnx (e.g. en_US-lessac-medium).
For macOS without extra setup, use engine: macos_say with a system voice name like Daniel.
pip install pytest
python -m pytest tests/ -v