Cacheon is a Bittensor subnet (SN14) that runs an open competition for production-grade LLM inference optimization. Miners submit containerized inference servers. Validators evaluate them against a vLLM baseline on the same hardware. The fastest correct server takes all emission.
V1 arena: Qwen2.5-72B-Instruct on 4x H200 or equivalent GPUs. Beat the pinned vLLM baseline on TTFT and throughput while passing a greedy-decoding correctness gate.
- Miners build an inference server, package it as a Docker image, and then commit the image reference and image digest on-chain.
- Validators scan the chain for new commitments, pull the image, and run it with model weights mounted at
/models. - Scoring measures TTFT and throughput improvement over the vLLM baseline. Correctness is checked first -- fail it and the score is zero.
- The fastest correct server becomes king and earns all subnet emission until someone beats it.
- Challengers must exceed the king's score by a small decaying margin (~1% at crowning, decaying to 0 over ~7 days) to prevent noise-driven churn.
Score formula:
if not correctness_pass:
score = 0.0
else:
ttft_imp = max(0, (baseline_ttft - miner_ttft) / baseline_ttft)
tps_imp = max(0, (miner_tps - baseline_tps) / baseline_tps)
score = 0.5 * ttft_imp + 0.5 * tps_impBuild an inference server that serves Qwen2.5-72B-Instruct via /v1/chat/completions with streaming and logprobs. Package it as a Docker image (maximum 20 GB; model weights are mounted at runtime, not baked into the image). Push it to a public registry and commit on-chain.
Requirements: public container registry, Bittensor wallet registered on SN14. GPU hardware is only needed for local testing.
# Push your image
docker tag my-server:latest docker.io/myuser/cacheon-miner:v1
docker push docker.io/myuser/cacheon-miner:v1
# Commit on-chain (one shot per hotkey -- test locally first)
python miner/commit.py \
--wallet-name <wallet> \
--wallet-hotkey <hotkey> \
--image "docker.io/myuser/cacheon-miner:v1" \
--digest "sha256:..." \
--network finney \
--netuid 14Full guide: cacheon.ai/docs/miners/overview
The validator has two components: an always-on CPU host (chain scanning, weight setting) and an ephemeral GPU pod (eval). The GPU pod is rented on-demand only when challengers are queued.
GPU requirements: NVLink/SXM interconnect, 4x H200 or equivalent, 400 GB storage, model weights at /workspace/models/Qwen2.5-72B-Instruct.
# CPU host (always-on)
git clone https://github.com/latent-to/cacheon
cd cacheon
cp .env.example .env # add wallet and S3 config
docker compose up --build
# GPU pod (on-demand, run when challengers appear)
bash scripts/gpu_setup/setup.sh
docker compose -f docker-compose.gpu.yml up --buildFull guide: cacheon.ai/docs/validators/overview
| Miners | Validators | Evaluation | |
|---|---|---|---|
| Start here | Overview | Overview | Scoring |
| Reference | API contract | Architecture | Harness |
| Setup | Quickstart | GPU pod setup | Prompts |
| Rules | Rules | CPU host setup | Roadmap |
MIT