Skip to content

Emit X-SMG-Routing-Key from the miles session server (sticky agentic routing)#30

Open
DavidBellamy wants to merge 1 commit into
prodfrom
agentic-rl/manual-min-load-routing
Open

Emit X-SMG-Routing-Key from the miles session server (sticky agentic routing)#30
DavidBellamy wants to merge 1 commit into
prodfrom
agentic-rl/manual-min-load-routing

Conversation

@DavidBellamy

@DavidBellamy DavidBellamy commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

What this does

The miles session server proxies every agent turn to the SMG gateway. This PR makes it tag each proxied chat-completion with X-SMG-Routing-Key: <session_id>, so a routing-key gateway policy (manual / consistent_hashing) pins a session to one worker and reuses that worker's KV cache across the agent's turns.

This is the miles half of enabling SMG manual routing for agentic-rl runs. The other half (selecting the policy) is a gateway-launch change in RL360 — see RL360#355.

Why this is the only miles change needed

The SMG gateway is launched by the cluster job (RL360's agentic-rl.sbatchlib/launch_gateway.shpython3 -m smg.launch_router --policy …), and miles connects to it as an external client (--sglang-router-ip/--sglang-router-port). miles does not launch the router and is not told its policy, so:

  • It cannot gate the header on the gateway's policy.
  • The router-policy / assignment-mode flags on the miles command don't affect the externally-launched gateway.

So this PR emits the routing key unconditionally. That's safe: in the SMG codebase only manual and consistent_hashing read X-SMG-Routing-Key; cache_aware (the current default) ignores it, so today's runs are behaviorally unchanged (aside from a benign per-worker routing-key load counter).

Scope

  • In: sessions.py tags proxied chat-completions (initial + rollback-retry) with X-SMG-Routing-Key=session_id.
  • Out (deliberately): the single_turn / multi_turn direct-/generate generate functions are not tagged (they don't go through the session server); selecting manual + min_load is the RL360 gateway-launch change.

Companion RL360 PR

RL360: set the gateway cookbook to policy: manual + assignment_mode: min_load (plumbed through gateway_args.shsmg.launch_router --policy manual --assignment-mode min_load). RL360#355.

Validation

End-to-end on M2 with the gateway on manual/min_load and this branch mounted: evidence (gateway log policy: Manual, smg_manual_policy_branch_total{branch="occupied_hit"} climbing = sessions pinning). More details in the comment below.

@DavidBellamy DavidBellamy requested a review from a team June 3, 2026 22:30
…ting

The session server proxies each agent turn to the externally-launched SMG
gateway. Tag every proxied chat-completion with X-SMG-Routing-Key=session_id so
a routing-key gateway policy (manual / consistent_hashing) pins the session to
one worker, reusing its KV cache across turns.

Emitted unconditionally: the gateway is launched by the cluster job (RL360), not
by miles, so miles cannot know its policy. The header is ignored by policies
that do not route on it (e.g. cache_aware); only manual / consistent_hashing
read it. Selecting manual + min_load is a gateway-launch (RL360) change, not a
miles change.
@DavidBellamy DavidBellamy force-pushed the agentic-rl/manual-min-load-routing branch from c709ba1 to 4aed171 Compare June 4, 2026 02:32
@DavidBellamy DavidBellamy changed the title Support SMG manual routing policy and min_load assignment mode Emit X-SMG-Routing-Key from the session server (sticky agentic routing) Jun 4, 2026
@DavidBellamy

DavidBellamy commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

E2E validation on M2 - job 1709479

Setup. pd-hicache-l3 smoke composition, agentic-rl-latest image (Jun 2), 10 nodes, GLM-4.7-Flash PD-disaggregated. Gateway launched with --policy manual --assignment-mode min_load (what RL360#355's pd-manual recipe emits via gateway_args.sh); the miles session-server change (#30 @ 4aed171) mounted over the image's baked miles via PYTHONPATH_PREPEND.

Evidence files (on M2)

  • Trainer / job log: /mnt/weka/shrd/k2pta/rl360/logs/agentic-rl-1709479.log
  • SMG gateway log: /mnt/weka/shrd/k2pta/rl360/logs/gateway-1709479.log

Persistent, re-verifiable evidence (from the logs on M2)

Gateway ran Manual + MinLoad, from the gateway-1709479.log startup line:

| policy: Manual { eviction_interval_secs: 60, max_idle_secs: 14400, assignment_mode: MinLoad } | max_payload: 512MB
... smg::policies::registry: Assigning policy manual to new model model

Full RL loop completed (agentic-rl-1709479.log):

step 0: {'train/loss': 0.0, 'train/pg_loss': 0.0, 'train/entropy_loss': 0.24466 ...}
step 1: {'train/loss': 0.0, 'train/pg_loss': 0.0, 'train/entropy_loss': 0.43185 ...}

sacct: COMPLETED 00:40:39 ExitCode 0:0. No real errors (AssertionError=0, CUDA error=0, FAILED=0; the 2 Traceback string hits are inside agent SWE-task rollout text, not exceptions).

This alone establishes the chain end to end: the gateway runs the Manual policy with MinLoad assignment, and a full agentic RL step completes cleanly with it active.

Stickiness: live metric scrape during the run

Captured by polling the SMG metrics endpoint (:29000/metrics) while the job ran (point in time, ~30 min elapsed; the endpoint is torn down when the job ends, so it is not re-scrapeable post-run):

smg_manual_policy_branch_total{branch="vacant"}        8
smg_manual_policy_branch_total{branch="occupied_hit"}  88
smg_manual_policy_cache_entries                         4

occupied_hit was observed climbing 2 -> 4 -> 36 -> 88 across successive scrapes.

  • occupied_hit far exceeds vacant (88 vs 8): a session's 2nd+ turns route back to the worker holding its KV cache, i.e. stickiness.
  • No no_routing_id counter plus cache_entries > 0: confirms miles' session server is supplying X-SMG-Routing-Key. Without it, every request would land in no_routing_id and cache_entries would be 0.

Exact commands used for this run

The pd-manual gateway recipe (RL360#355) is not merged yet, so the gateway launch was hand-edited to emit exactly what the recipe produces. Run from scripts/train/agentic-rl/ on an M2 login node:

# 1. pull the latest agentic-rl image from ECR into agentic_rl_images/<tag>.sqsh
bash pull-to-cluster.sh agentic-rl-latest

# 2. compile the smoke sbatch (self-contained; does not submit)
./compile_sbatch.sh pd-hicache-l3 smoke default --image-tag agentic-rl-latest --allow-dirty \
  -o /mnt/weka/home/david.bellamy/manual-routing-smoke2.sbatch

# 3. set the gateway to manual + min_load (what RL360 #355's pd-manual recipe emits) and raise walltime
SB=/mnt/weka/home/david.bellamy/manual-routing-smoke2.sbatch
sed -i 's/--policy cache_aware/--policy manual --assignment-mode min_load/' "$SB"
sed -i 's/#SBATCH --time=00:30:00/#SBATCH --time=01:00:00/' "$SB"

# 4. submit with the miles#30 branch mounted over the image's baked miles
#    B was: git clone -b agentic-rl/manual-min-load-routing https://github.com/LLM360/miles.git "$B"
B=/mnt/weka/home/david.bellamy/miles-routing-e2e   # miles @ 4aed171
PYTHONPATH_PREPEND="$B" sbatch --export=ALL,PYTHONPATH_PREPEND="$B" "$SB"

Once #30 and RL360#355 merge, the hand-edit and mount are unnecessary: add gateway: pd-manual to the composition and run ./agentic-rl <composition> --scale smoke.

Reproduce (while a job runs, from an M2 login node)

grep -aoE '\| policy: [A-Za-z]+ \{[^}]*\}' /mnt/weka/shrd/k2pta/rl360/gateway-<JOBID>.log | head -1
curl -s http://<gateway-head-ip>:29000/metrics | grep smg_manual_policy

@DavidBellamy DavidBellamy changed the title Emit X-SMG-Routing-Key from the session server (sticky agentic routing) Emit X-SMG-Routing-Key from the miles session server (sticky agentic routing) Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant