…ting
The session server proxies each agent turn to the externally-launched SMG
gateway. Tag every proxied chat-completion with X-SMG-Routing-Key=session_id so
a routing-key gateway policy (manual / consistent_hashing) pins the session to
one worker, reusing its KV cache across turns.
Emitted unconditionally: the gateway is launched by the cluster job (RL360), not
by miles, so miles cannot know its policy. The header is ignored by policies
that do not route on it (e.g. cache_aware); only manual / consistent_hashing
read it. Selecting manual + min_load is a gateway-launch (RL360) change, not a
miles change.
What this does
The miles session server proxies every agent turn to the SMG gateway. This PR makes it tag each proxied chat-completion with
X-SMG-Routing-Key: <session_id>, so a routing-key gateway policy (manual/consistent_hashing) pins a session to one worker and reuses that worker's KV cache across the agent's turns.This is the miles half of enabling SMG manual routing for agentic-rl runs. The other half (selecting the policy) is a gateway-launch change in RL360 — see RL360#355.
Why this is the only miles change needed
The SMG gateway is launched by the cluster job (RL360's
agentic-rl.sbatch→lib/launch_gateway.sh→python3 -m smg.launch_router --policy …), and miles connects to it as an external client (--sglang-router-ip/--sglang-router-port). miles does not launch the router and is not told its policy, so:So this PR emits the routing key unconditionally. That's safe: in the SMG codebase only
manualandconsistent_hashingreadX-SMG-Routing-Key;cache_aware(the current default) ignores it, so today's runs are behaviorally unchanged (aside from a benign per-worker routing-key load counter).Scope
sessions.pytags proxied chat-completions (initial + rollback-retry) withX-SMG-Routing-Key=session_id.single_turn/multi_turndirect-/generategenerate functions are not tagged (they don't go through the session server); selectingmanual+min_loadis the RL360 gateway-launch change.Companion RL360 PR
RL360: set the gateway cookbook to
policy: manual+assignment_mode: min_load(plumbed throughgateway_args.sh→smg.launch_router --policy manual --assignment-mode min_load). RL360#355.Validation
End-to-end on M2 with the gateway on
manual/min_loadand this branch mounted: evidence (gateway logpolicy: Manual,smg_manual_policy_branch_total{branch="occupied_hit"}climbing = sessions pinning). More details in the comment below.