A pragmatic local-first coding assistant with automatic escalation.
mux runs coding tasks locally when it's confident, and escalates to cloud backends (Claude, Codex, Gemini) when it needs to. It's designed to maximize local execution while maintaining reliability through transparency and intelligent fallback.
Use mux when you want:
- Coding tasks to run on your machine, not in the cloud
- Automatic escalation when complexity increases or confidence is low
- Visibility into which tasks went local vs. cloud, and why
- Lower latency for routine refactors, fixes, and small features
- Local-first with smart escalation — runs tasks locally unless confidence is low, complexity is high, or verification fails
- Automatic fallback — escalates to Claude, Codex, or Gemini CLIs with one command
- Observability — ledger tracking of every task, routing decisions, and failure reasons
- Health checks — quickly verify local model availability and backend CLIs
- MCP integration — use as a tool in Claude Code or any MCP-compatible client
- File-aware — handles file creation, modification, and testing through structured materialization
After installation, you're ready to route tasks:
# Run a task (routes locally if available, escalates if needed)
python3 -m mux.cli run --task "Fix the null pointer bug in UserService and keep tests passing"
# Check health of local model and backends
python3 -m mux.cli doctor
# View routing stats and recent activity
python3 -m mux.cli statusTasks with file paths or explicit intent (create, build, scaffold) automatically use local executor:
# Local executor will write these files to disk
python3 -m mux.cli run --task "Create a Python CLI project in /tmp/my-app with README and tests"View what was routed, why, and recent failures:
cat logs/run_ledger.jsonl | tail -5 | jq .Local tasks stay local when:
- Local model is available
- Task confidence is ≥0.6 (adjustable)
- No escalation keywords detected (see below)
Tasks escalate to cloud when:
- Local model is unavailable
- Confidence drops below threshold
- Complexity keywords detected:
security,migration,architecture,multi-file,refactor,concurrency, etc. - Verification repeatedly fails
File-creation tasks (detected by path or keywords like create, build, scaffold) route through local executor first, then optionally get SOTA peer review.
mux/cli.pyCLI entrypoint (run,doctor,status)mux/mcp_server.pyMCP tool servermux/router.pyrouting logic and escalation pathsmux/providers/local_provider.pylocal model clientmux/providers/local_executor.pylocal implementation materialization pathmux/providers/sota_provider.pyCLI-based SOTA routing/executionmux/local_runtime.pylocal runtime health/recovery diagnosticsmux/doctor.pyhealth checks for runtime and provider CLIsmux/ledger.pyJSONL ledger, rotation, and summariesmux/status.pyunified operational status payloadconfig/mux.yamlprimary configuration
git clone https://github.com/dgdev25/mux.git
cd mux
./setup.shThe setup script handles everything: Python version check, virtual environment creation, dependency installation, and automatic configuration of mux in Claude Code, Codex, or Gemini (whichever you have installed). After it completes, restart your CLI tool and you're done.
Requirements:
- Python 3.10+
- At least one cloud backend CLI installed (for escalation):
claude(Claude Code CLI)codex(OpenAI Codex)gemini(Google Gemini CLI)
Important: If you don't set up a local model, mux will still work but will escalate all tasks to cloud backends. Local models give you the benefits of low latency and privacy.
After running ./setup.sh, configure your local model server and update config/mux.yaml with the endpoint details.
For best inference quality and speed, set up Qwen 35B using the specialized installer:
# See the detailed setup guide:
cat docs/qwen-35b-setup.mdThis uses ik_llama.cpp with a quantized Qwen 35B model. The setup is more involved but provides significantly better results than smaller models.
Reference benchmark: abovespec/local-llm-benchmarks
Once running, setup.sh will auto-detect it at http://127.0.0.1:18473/v1.
- Ollama (simplest): Download from ollama.ai — works for Mistral 7B, Llama, and other models. Update
config/mux.yamlto point at your Ollama instance. - LM Studio (GUI-friendly): Download from lmstudio.ai — visually manage models, same OpenAI API
- vLLM or Text Generation WebUI: Advanced setups for production use
All require a server exposing an OpenAI-compatible /v1/chat/completions endpoint.
The key requirement: your local model server must expose an OpenAI-compatible /v1/chat/completions endpoint. Almost all modern local model servers support this.
-
Advisory tasks (read-only, no file changes)
- Examples: refactoring explanations, code reviews, architecture discussions
- Route: local model → confidence check → escalate if needed
-
Persistence tasks (create/modify files)
- Examples:
create a project in /tmp/app,build a CLI in /data/tool - Route: local executor → file materialization → optional SOTA review → optional SOTA fallback
- Examples:
Detected by keywords (build, create, scaffold, project) or absolute paths in task prompt.
Local model escalates when any of these occur:
- Confidence drops below 0.6 (configurable)
- Complexity keywords found:
security,architecture,migration,multi-file,refactor,concurrency,performance,ambiguous - Verification fails repeatedly
Local executor can request SOTA review after file creation, and optionally fall back to SOTA write if local materialization fails.
Confidence is determined by:
- Response length (longer = more thought)
- Completion reason (
stopandlengthare good, others lower confidence) - Task complexity
Adjust router.local_confidence_threshold in config/mux.yaml to be more/less aggressive about escalation.
python3 -m mux.cli run --task "Fix the JWT validation bug in auth/middleware.py"Returns:
route:localor{backend}(e.g.,claude,codex)model: Which model handled the taskreason: Why this route was chosenrun_id: Unique ID for logging/trackingoutput: The model's response
python3 -m mux.cli doctorVerifies:
- Local model available at configured health URL
- Restart command is valid (if configured)
- Cloud backend CLIs installed and accessible
- Network connectivity
Run this if tasks aren't behaving as expected.
python3 -m mux.cli statusShows:
- Local model health (UP/DOWN)
- Recent task counts by route
- Common failure reasons from last 200 tasks
- Ledger file location
Use this to spot trends (e.g., too many escalations, repeated errors).
. .venv/bin/activate
python3 -m mux.mcp_serverThis starts a stdio server that listens for requests from MCP clients.
Add to your Claude Code settings.json:
{
"mcpServers": {
"mux": {
"command": "python3",
"args": ["-m", "mux.mcp_server"],
"cwd": "/path/to/mux"
}
}
}Or copy the template from mcp-config.example.json and adjust paths.
mux(task: str)— Route a single taskmux_json(payload_json: str)— Send a JSON payload with optionsmux_doctor()— Health checkmux_health()— Quick health status
Be outcome-focused; let mux decide routing:
✅ Good:
Harden JWT auth middleware, keep API behavior unchanged, add regression testsRefactor caching layer for readability, preserve performance, run testsBuild a Python CLI in /tmp/report-cli with README, tests, and argument parsingInvestigate flaky integration tests and propose minimal safe fix
❌ Avoid:
Use Claude to refactor...(let mux choose the backend)Implement X in 100 tokens(constraints pre-judge routing)Fix this in local mode(mux decides when to escalate)
{
"task": "Build a scientific calculator in /tmp/calculator with Python CLI, safe expression eval, README, and tests",
"prefer_local": false,
"verify_with": "claude"
}All settings are in config/mux.yaml. The defaults work for most users.
To use a different local model:
providers:
local:
base_url: "http://127.0.0.1:11434/v1" # Your model server URL
model: "mistral:7b" # Your model nameTo escalate more aggressively (safer, but more cloud usage):
router:
local_confidence_threshold: 0.8 # Default 0.6; higher = escalate soonerTo trust local model more (faster, riskier):
router:
local_confidence_threshold: 0.4 # Lower = keep more tasks localTo set up automatic restart of your local model service:
local_runtime:
restart_cmd: "/home/user/models/restart-ollama.sh" # Custom script or systemctlTo change where logs are stored:
ledger:
file_path: "/var/log/mux/run_ledger.jsonl"| Section | Purpose | Default |
|---|---|---|
router |
Escalation thresholds and keywords | 0.6 confidence, security, architecture escalate |
local_runtime |
Health URL and restart strategy | http://127.0.0.1:18473/health |
providers.local |
Local model endpoint details | Qwen 35B on port 18473 |
providers.sota_cli |
Cloud backends (Claude, Codex, Gemini) | All available, Claude first |
workflow |
How persistence tasks are handled | Escalate, review, no fallback |
ledger |
Logging and observability | ~2MB rotation, keep 5 files |
See comments in config/mux.yaml for every option.
Every task is logged to a JSON ledger for auditing and analytics.
Default location: logs/run_ledger.jsonl
View recent tasks:
# Last 5 tasks (pretty-printed)
tail -5 logs/run_ledger.jsonl | jq .
# All local tasks
grep '"route":"local"' logs/run_ledger.jsonl | jq .
# All escalations (with reasons)
grep -v '"route":"local"' logs/run_ledger.jsonl | jq '{route, reason, model}'
# Count by route in last 100 tasks
tail -100 logs/run_ledger.jsonl | jq -s 'group_by(.route) | map({route: .[0].route, count: length})'Ledger fields:
run_id: Unique identifierroute:localor backend name (claude,codex,gemini)reason: Why this route was chosenmodel: Which model handled itconfidence: Confidence score (0–1)timestamp: When it rantask_length: Characters in taskoutput_length: Characters in response
Automatic rotation: Ledger rotates at ~2MB and keeps 5 historical files.
pytest -q # Run all tests
pytest -v # Verbose output
pytest tests/test_policy.py # Run one test fileTest coverage includes:
- Escalation routing logic (when does local stay local?)
- Local runtime health checks and recovery
- Ledger persistence and rotation
- File materialization and security
- Cloud backend fallback behavior
All tests use mocks and don't hit real APIs or models.
python3 -m mux.cli doctorIf local runtime shows DOWN:
-
Check your model server is running:
curl -s http://127.0.0.1:18473/health # or your configured port -
Verify the URL in config:
grep health_url config/mux.yaml
-
If using Ollama, make sure it's running:
ollama serve # or check systemctl status ollama -
Check logs:
tail -20 logs/run_ledger.jsonl | jq '.[] | select(.route != "local") | {reason, confidence}'
Make sure your task includes:
- An absolute path (e.g.,
/tmp/my-app) or relative to current dir - Keywords like
create,build,scaffold,project
Example task:
python3 -m mux.cli run --task "Create a Python project in /tmp/test-app with main.py and tests"Check the ledger reason if it doesn't create files:
tail -1 logs/run_ledger.jsonl | jq '{reason, route}'which claude # or which codex, which geminiIf not installed, use your package manager or visit:
- Claude: https://github.com/anthropics/claude-code
- Codex: https://github.com/openai/codex-cli
- Gemini: https://github.com/google-gemini/google-cloud-sdk
Lower the confidence threshold:
router:
local_confidence_threshold: 0.4 # Default 0.6Or check if complexity keywords are triggering escalation:
grep '"reason":"complexity' logs/run_ledger.jsonl | tail -10Local model safety:
- Your code never leaves your machine (good for IP, bad for capability)
- Local models can be weaker than cloud models — expect more escalations
- No API keys needed for local mode
File materialization safety:
- Only files in the configured work directory can be created
- Path traversal (
../) is rejected - Absolute paths outside the intended directory are rejected
- All file operations are logged
Cloud escalation safety:
- Cloud backend CLIs use your configured API keys (set via env vars)
- Review what commands are actually being executed (check logs)
muxdoesn't modify your actualclaude,codex, orgeminiconfigurations
Running in production:
- Start with local model only, no cloud escalation
- Gradually expand escalation as you build confidence
- Monitor the ledger for what's staying local vs. escalating
- Set
router.local_confidence_thresholdconservatively (higher = safer)
Common workflows:
- Adding a new config option: Update
config/mux.yaml, add env var support, updatetypes.py, add test - Changing routing logic: Update
router.py, add test intest_policy.py, update README routing section - Adding a new backend: Add to
providers/sota_provider.py, configure inmux.yaml, test
Before committing:
pytest -q # All tests pass
python3 -m mux.cli doctor # Doctor works
python3 -m mux.cli status # Status works