Model Lab is the planned open-model workbench for CodeWhale. The north star is simple: CodeWhale should become the best terminal coding agent for open-source and open-weight models across every provider that offers them. Model Lab is how those models become discoverable, evaluable, routable, servable, and exportable without weakening the current terminal-agent contract: local workspace control, explicit provider auth, approval gates, and clear privacy boundaries.
This document is roadmap language. It does not mean every workset below is implemented today.
- DeepSeek is the first-class default provider today, with
deepseek-v4-pro,deepseek-v4-flash, streaming thinking blocks, Fin routing,DEEPSEEK_*environment variables, and~/.deepseekconfig compatibility. - OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, Hugging
Face Inference Providers, generic OpenAI-compatible endpoints, SGLang, vLLM,
and Ollama are supported provider paths where their IDs appear in
/provider,codewhale --provider, orcodewhale models. - Model auto-routing chooses a concrete DeepSeek model and thinking level per turn. It is not a TUI mode.
- Fin is the fast
deepseek-v4-flashthinking-off path for routing, summaries, cheap checks, RLM child calls, wakeup verification, and binary-completion checks. - Self-hosted OpenAI-compatible endpoints can be used through SGLang, vLLM,
Ollama, or the generic
openaiprovider configuration.
- A native Hugging Face Hub browser, model passport picker, or direct Hub search workflow. The OpenAI-compatible Hugging Face Inference Providers route is implemented separately as a chat provider.
- Built-in Hugging Face model card, dataset, adapter, safetensors, Spaces, or Jobs workflows.
- Native Unsloth, NeMo, or Arcee integrations.
- A dedicated Model Lab UI tab.
- Built-in benchmark suites, eval leaderboards, hosted observability, or training-infrastructure orchestration.
Until those land, use the provider paths above, MCP servers, or external workflows explicitly configured by the user.
Model Lab should help users answer practical questions:
- Which model should handle this turn?
- Which open or open-weight model can I run locally or through a trusted provider?
- Which provider offers this model with the latency, price, context window, license, and privacy posture I need?
- What did this model cost, how did it perform, and what data left my machine?
- Can I reproduce, export, or self-host the route?
It should never hide provider boundaries, silently upload local artifacts, or describe a model as available before CodeWhale can actually route to it.
Planned scope:
- Hub API auth and model discovery.
- Model cards, licenses, tags, safetensors metadata, adapters, and dataset links surfaced in a terminal-friendly way.
- Native Hub browser and model-passport metadata on top of the already separate Hugging Face Inference Providers chat route.
- Hugging Face Jobs as an optional remote execution path for user-approved experiments.
Non-goal for now: claiming native Hub search, model passports, Spaces/Jobs, or Model Lab UI exists before those surfaces are implemented in code.
Planned scope:
- Fine-tuning recipes and adapter workflows for users who already own the data and compute path.
- Export guidance that keeps dataset, adapter, and checkpoint locations explicit.
- Compatibility notes for models that can return to local serving or a hosted OpenAI-compatible endpoint.
Planned scope:
- Training and alignment workflow notes for users operating NVIDIA-centric infrastructure.
- Clear boundaries between NVIDIA NIM inference support that exists today and future NeMo training or customization workflows.
Planned scope:
- Small-model routing and specialization experiments.
- Exportable routes that make it clear when a task is handled by a smaller model, Fin, or full DeepSeek reasoning.
Planned scope:
- Better local and private serving ergonomics for SGLang, vLLM, Ollama, and OpenAI-compatible gateways.
- Health checks, model listing, context-window metadata, and route validation.
- No silent network exposure: public endpoints must be configured explicitly.
Planned scope:
- Reproducible task suites for coding, review, docs, release checks, and long-context workflows.
- Side-by-side route comparisons where the exact model, provider, thinking level, prompt, and tool policy are captured.
Planned scope:
- Local-first traces for turn routing, tool calls, approvals, cost, cache behavior, and context pressure.
- Export rules that redact secrets and require explicit user action before data leaves the machine.
Planned scope:
- Recipes for dataset preparation, adapter training, artifact naming, and promotion into serving.
- Separation between local/private artifacts and anything published to a hub or registry.
- Local files, prompts, transcripts, traces, model outputs, eval results, adapters, datasets, and checkpoints should remain local unless the user explicitly chooses a provider or export destination.
- Provider auth must remain explicit.
DEEPSEEK_*, OpenRouter, Hugging Face, and self-hosted credentials should not be inferred from unrelated config. - Exportable artifacts should include provenance: source model, provider, route, tool policy, eval inputs, and redaction status.
- Public sharing, hosted telemetry, sponsorship badges, and external branding require maintainer approval.