Skip to content

Latest commit

 

History

History
148 lines (110 loc) · 5.64 KB

File metadata and controls

148 lines (110 loc) · 5.64 KB

Model Lab Roadmap

Model Lab is the planned open-model workbench for CodeWhale. The north star is simple: CodeWhale should become the best terminal coding agent for open-source and open-weight models across every provider that offers them. Model Lab is how those models become discoverable, evaluable, routable, servable, and exportable without weakening the current terminal-agent contract: local workspace control, explicit provider auth, approval gates, and clear privacy boundaries.

This document is roadmap language. It does not mean every workset below is implemented today.

Implemented Today

  • DeepSeek is the first-class default provider today, with deepseek-v4-pro, deepseek-v4-flash, streaming thinking blocks, Fin routing, DEEPSEEK_* environment variables, and ~/.deepseek config compatibility.
  • OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, Hugging Face Inference Providers, generic OpenAI-compatible endpoints, SGLang, vLLM, and Ollama are supported provider paths where their IDs appear in /provider, codewhale --provider, or codewhale models.
  • Model auto-routing chooses a concrete DeepSeek model and thinking level per turn. It is not a TUI mode.
  • Fin is the fast deepseek-v4-flash thinking-off path for routing, summaries, cheap checks, RLM child calls, wakeup verification, and binary-completion checks.
  • Self-hosted OpenAI-compatible endpoints can be used through SGLang, vLLM, Ollama, or the generic openai provider configuration.

Not Implemented Yet

  • A native Hugging Face Hub browser, model passport picker, or direct Hub search workflow. The OpenAI-compatible Hugging Face Inference Providers route is implemented separately as a chat provider.
  • Built-in Hugging Face model card, dataset, adapter, safetensors, Spaces, or Jobs workflows.
  • Native Unsloth, NeMo, or Arcee integrations.
  • A dedicated Model Lab UI tab.
  • Built-in benchmark suites, eval leaderboards, hosted observability, or training-infrastructure orchestration.

Until those land, use the provider paths above, MCP servers, or external workflows explicitly configured by the user.

Model Lab Principle

Model Lab should help users answer practical questions:

  • Which model should handle this turn?
  • Which open or open-weight model can I run locally or through a trusted provider?
  • Which provider offers this model with the latency, price, context window, license, and privacy posture I need?
  • What did this model cost, how did it perform, and what data left my machine?
  • Can I reproduce, export, or self-host the route?

It should never hide provider boundaries, silently upload local artifacts, or describe a model as available before CodeWhale can actually route to it.

Hugging Face Workset

Planned scope:

  • Hub API auth and model discovery.
  • Model cards, licenses, tags, safetensors metadata, adapters, and dataset links surfaced in a terminal-friendly way.
  • Native Hub browser and model-passport metadata on top of the already separate Hugging Face Inference Providers chat route.
  • Hugging Face Jobs as an optional remote execution path for user-approved experiments.

Non-goal for now: claiming native Hub search, model passports, Spaces/Jobs, or Model Lab UI exists before those surfaces are implemented in code.

Unsloth Workset

Planned scope:

  • Fine-tuning recipes and adapter workflows for users who already own the data and compute path.
  • Export guidance that keeps dataset, adapter, and checkpoint locations explicit.
  • Compatibility notes for models that can return to local serving or a hosted OpenAI-compatible endpoint.

NeMo Workset

Planned scope:

  • Training and alignment workflow notes for users operating NVIDIA-centric infrastructure.
  • Clear boundaries between NVIDIA NIM inference support that exists today and future NeMo training or customization workflows.

Arcee Workset

Planned scope:

  • Small-model routing and specialization experiments.
  • Exportable routes that make it clear when a task is handled by a smaller model, Fin, or full DeepSeek reasoning.

Serving Workset

Planned scope:

  • Better local and private serving ergonomics for SGLang, vLLM, Ollama, and OpenAI-compatible gateways.
  • Health checks, model listing, context-window metadata, and route validation.
  • No silent network exposure: public endpoints must be configured explicitly.

Eval Workset

Planned scope:

  • Reproducible task suites for coding, review, docs, release checks, and long-context workflows.
  • Side-by-side route comparisons where the exact model, provider, thinking level, prompt, and tool policy are captured.

Observability Workset

Planned scope:

  • Local-first traces for turn routing, tool calls, approvals, cost, cache behavior, and context pressure.
  • Export rules that redact secrets and require explicit user action before data leaves the machine.

Training Infra Workset

Planned scope:

  • Recipes for dataset preparation, adapter training, artifact naming, and promotion into serving.
  • Separation between local/private artifacts and anything published to a hub or registry.

Privacy And Export Rules

  • Local files, prompts, transcripts, traces, model outputs, eval results, adapters, datasets, and checkpoints should remain local unless the user explicitly chooses a provider or export destination.
  • Provider auth must remain explicit. DEEPSEEK_*, OpenRouter, Hugging Face, and self-hosted credentials should not be inferred from unrelated config.
  • Exportable artifacts should include provenance: source model, provider, route, tool policy, eval inputs, and redaction status.
  • Public sharing, hosted telemetry, sponsorship badges, and external branding require maintainer approval.