Codex load balancer is a pragmatic reverse proxy and load balancer for Codex. It aggregates multiple ChatGPT auth tokens, keeps usage in memory, and selects the best token per request to avoid rate limits.
- Token directory scan on startup and hot reload (polling).
- Usage sync at startup and every 5 minutes by default.
- Load balancing with weekly limit priority and 5-hour health degradation.
- Session stickiness via common headers.
- Automatic failover on rate limit responses.
- WebSocket upgrade proxy support.
- Per-request token usage persistence (
input/cached/output) to SQLite. - Built-in web dashboard for global/account usage and quota status.
- Stats dashboard for internal usage.
- Go 1.25+
go build -o codex-load-balancer ../codex-load-balancer \
--api-key your-api-key \
--data-dir ./data \
--port 8080 \
--sync-interval 5m \
--sync-concurrency 8Flags:
--api-key(required): API key for protected proxy endpoints.--data-dir(required): Directory containing active*.jsonauth files.--port(optional): Listen port. Default8080.--sync-interval(optional): Usage sync interval. Default5m.--sync-concurrency(optional): Usage sync concurrency. Default8.
Put credential *.json files in ./data, then start the service:
CLB_API_KEY=your-api-key docker compose up -d --buildBy default, Compose publishes 8080:8080. Override the host port when needed:
CLB_API_KEY=your-api-key CLB_PORT=9090 docker compose up -d --buildCompose passes runtime settings through environment variables:
CLB_API_KEY(required): API key for protected proxy endpoints.CLB_PORT(optional): Host port to publish. Default8080.CLB_LISTEN_PORT(optional): Container listen port. Default8080.CLB_DATA_DIR(optional): Container data directory. Default/app/data.CLB_SYNC_INTERVAL(optional): Usage sync interval. Default5m.CLB_SYNC_CONCURRENCY(optional): Usage sync concurrency. Default8.
Notes:
- Usage sync and dashboard state are stored in
data-dir/clb.db. - The service no longer reads a TOML config file.
Codex load balancer stores Codex credential JSON. The proxy reads .tokens.access_token, .tokens.account_id, .tokens.refresh_token, optional .tokens.id_token, and .last_refresh from each *.json file. If id_token is present, its unsigned JWT claims are used only as a local hint for user_id and email; upstream requests still use .tokens.account_id for ChatGPT-Account-ID.
Example:
{
"auth_mode": "chatgpt",
"last_refresh": "2026-03-30T16:00:00Z",
"created_at": "2026-03-30T16:00:00Z",
"tokens": {
"id_token": "...",
"access_token": "...",
"refresh_token": "...",
"account_id": "account_123"
}
}- Allowed paths:
/responses,/v1/responses,/models, and/v1/modelsonly. /v1/responsesand/v1/modelsare normalized by stripping/v1upstream.- Most request headers are preserved;
Authorizationis replaced andAccept-Encodingis removed so the proxy can inspect upstream response bodies. - For WebSocket upstream requests,
Sec-WebSocket-Extensionsis stripped so usage frames stay observable as plain JSON (no per-message compression). - Upstream base URL:
https://chatgpt.com/backend-api/codex. - WebSocket (
Upgrade: websocket) requests are proxied through the selected token.
If a request includes one of the following headers, Codex load balancer binds that session to a token:
session_id
If the bound token hits a limit error, Codex load balancer unbinds and reselects.
- Filter out invalid, cooled down, or exhausted tokens.
- Prefer higher
weekly_limit. - If the top token has <30% 5-hour remaining and another token has higher 5-hour remaining, pick the healthier token.
- If weekly limits tie, pick higher 5-hour remaining.
If the upstream responds with status 429, returns a Codex usage_limit_reached error, or emits a streamed Responses/WebSocket limit failure, the current token is cooled down and its sticky sessions are cleared. Non-stream requests are retried once with another token.
- Syncs at startup and every 5 minutes.
- Uses
https://chatgpt.com/backend-api/wham/usage. - Account metadata shown in the dashboard, including
user_id,account_id,email, andplan_type, comes from the usage response in real time. - Per-account usage is grouped by a local identity key:
user_id, thenaccount_id. This keeps Business Team members separate when upstream returns the sameaccount_id. - Before proxying or syncing usage, Codex load balancer refreshes stale access tokens from the stored refresh token.
- On
401, Codex load balancer refreshes once and retries. If the token still stays unauthorized during usage sync, it removes the credential file and evicts the token from memory.
Endpoints:
GET /stats
GET /stats/overview
Auth:
- No auth on
/stats*(intended for trusted internal network only).
Dashboard data:
- Overview cards:
today,recent_7_days,recent_30_days,totalwithinput_tokens,cached_tokens,output_tokens,reasoning_tokens. - Trend and composition:
recent_90_days,trend.windows[]for 7/30/90 day UTC buckets, andcompositionfor cached input / non-cached input / output split. - Current dashboard page load uses only
/stats/overview. - Account table:
account_key,user_id,account_id,email,plan_type, totals, per-accountcomposition, and 5-hour / weekly quota usage from usage sync (/backend-api/wham/usage). - The dashboard loads Chart.js from a pinned CDN URL in
web/index.htmland Alpine.js from a pinned ESM CDN import inweb/app.js; Tailwind CSS remains embedded fromweb/tailwind.css.
Codex load balancer logs structured events via log/slog to stdout.