Ambient AI chat panel for macOS. Watches your screen, listens for wake words, streams context into the Providence Core conversation.
Providence Overlay is a transparent floating chat panel that runs as a menu bar companion to Providence Core. It captures screen + audio, classifies what you are doing, and feeds that context into your assistant's next turn. Think of it as a heads-up display for an LLM session, not a separate chatbot.
git clone https://github.com/gravitrone/providence-overlay.git
cd providence-overlay
make installAdd to ~/.providence/config.toml (Providence Core side):
[overlay]
enable = true
auto_start = true
spawn = false
ui_mode = "chat"
chat_history_limit = 50
chat_alpha = 0.92
daily_token_budget = 50000Launch Providence Core, then in a fresh shell:
open -n -a ~/Applications/"Providence Overlay.app" \
--args --socket=$HOME/.providence/run/overlay.sockmacOS will prompt for Screen Recording, Accessibility, and Microphone permissions on first run. Grant each when asked.
Most AI assistants are stateless between turns. You screenshot, paste, explain, repeat. Every context switch is friction and every missed detail is a worse answer. Tools that try to solve this (Cluely, Rewind, Granola) either spam the model with every frame or hide behind private APIs that Apple breaks. Providence Overlay uses documented macOS APIs, respects TCC, and gates emissions aggressively so the token cost stays bounded.
Dedupe first, emit only on change, stop at the daily budget. That is the whole philosophy.
- Two rendering modes - Ghost panel fades in on suggestions and auto-hides. Chat panel stays persistent with scrollable history and a text input. Toggle between them via config or the menu bar.
- ScreenCaptureKit pipeline - Adaptive frame rate (0.2 fps idle, 1 fps active, 2 fps in meetings, 5 fps burst). dHash deduplication skips identical frames.
- Local transcription - mlx-swift-audio runs whisper-large-v3-turbo (fp16) via MLX on Apple Silicon. "Hey Providence" wake word via on-device SFSpeechRecognizer. Cmd+Option+Space for push-to-talk.
- Accessibility-first context - Reads the focused window's AX tree for structured text instead of relying on OCR. Falls back to Vision framework for apps without AX support.
- Stealth by default -
sharingType = .noneon all panels and an auto-hide heuristic when Zoom, Teams, Meet, Chime, or FaceTime is frontmost. - Bounded cost - Jaccard-similarity transcript gating, 5-second heartbeat minimum, per-day token budget with automatic shutoff.
- Menu bar control - Toggle UI mode, pause capture, add apps to the exclusion list, view session token spend.
flowchart LR
A[Providence Core TUI] -- UDS NDJSON --> B[BridgeClient]
B --> C[AppState]
D[ScreenCaptureKit] --> E[CaptureService]
E --> F[FrameDedupe + AXReader]
F --> G[ContextCompressor]
G -- context_update --> A
H[AVAudioEngine] --> I[AudioService]
I --> J[WhisperTranscriber]
I --> K[WakeWordService]
C --> L[SuggestionPanel]
C --> M[ChatPanel]
A -- assistant_delta --> C
Two Swift Package Manager targets. ProvidenceOverlay is the executable with AppKit, SwiftUI, and all framework integrations. ProvidenceOverlayCore is a pure-logic library containing the activity classifier, perceptual hash, transcript similarity, and Codable models.
Context reaches the model via <system-reminder origin="overlay"> blocks that Providence Core prepends to the next user turn. See the Providence Core repository for the TUI-side implementation.
Cmd+Shift+C- Toggle chat panel visibilityCmd+Shift+P- Toggle ghost panel interactivity (click-through vs. clickable)Cmd+Option+Space- Push-to-talk, starts a 10-second recording window"Hey Providence"- Wake word (always listening when audio capture is active)
make build # swift build -c release + ad-hoc codesign
make test # swift test
make app # wrap into Providence Overlay.app bundle
make install # copy bundle to ~/Applications + shim at ~/.providence/bin
make clean # remove .build and build/Targets macOS 15.4+ (Swift 6.2 toolchain, mlx-swift-audio requirement). Tests require DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer when run outside Xcode. mlx-swift-audio loads the whisper-large-v3-turbo model (~1.5 GB fp16) from the HuggingFace cache at ~/.cache/huggingface/hub/ on first launch, downloading if not cached.
- Requires macOS 15.4+ due to the mlx-swift-audio toolchain.
- First launch pulls ~1.5 GB of model weights into
~/.cache/huggingface/hub/if the cache is empty. - System audio tap is stubbed. Microphone-only transcription in meetings for now.
- Wake word uses SFSpeechRecognizer instead of Porcupine. English-only, on-device.
- Chat history is in-memory. SQLite persistence is planned.
- On macOS 15+,
sharingType = .noneis ignored by ScreenCaptureKit. The auto-hide heuristic covers that gap for known screen-share apps.
Pull requests welcome. For substantial changes, open an issue first to discuss. Follow the conventions in CLAUDE.md.