feat(session): wire aborted turns to provisional record#40
Open
nightlessbaron wants to merge 3 commits into
Open
feat(session): wire aborted turns to provisional record#40nightlessbaron wants to merge 3 commits into
nightlessbaron wants to merge 3 commits into
Conversation
Spawn N independent vanilla session-server processes on the SGLang gateway node; callee-side random pick per session. Adds structured per-request observability (chat_start / chat_done with timing buckets + token counts) and a per-worker stats heartbeat for memory/throughput. This PR is the safe (multi-backend + logging) subset of the original PR #33 design. The lock-restoration + DELETE cancellation channel + SessionStateConflictError 409 surface — which originally shipped in this PR — have been split out to a follow-up so they can be reviewed and landed independently. See the lock-restore PR for that work. Session-server changes: - Spawn N independent backends + callee-side pick (miles/ray/rollout.py, miles/utils/arguments.py) - chat_start / chat_done structured logs with req_id (uuid8) and timing buckets (lock_wait_ms, tokenize_in_ms, proxy_elapsed_ms, tokenize_out_ms, total_ms, inflight_now, prompt_tokens, completion_tokens). One INFO log per request, success or short-circuit. - Per-worker _stats (reqs_total, turns_completed, inflight) + _worker_stats registry keyed by session_server_port. Background _stats_logger_loop in session_server.py emits 30s heartbeat with rss_mb / vms_mb via psutil. - pid=<n> prefix on every record in each subprocess via logging.basicConfig in run_session_server (force=True overrides ray-set handlers). - Enriched state_changed_during_proxy warning with worker_port, inflight_chat_count, caller_request_id (x-request-id), proxy_elapsed_ms — grep-correlatable with Layer-1 client retries. - prepare_pretokenized + update_pretokenized_state run in asyncio.to_thread so the event loop isn't blocked by sync TITO tokenizer calls (independent perf win, ~40% off p99 in microbench). - proxy_transport_error warning includes url + elapsed_ms. Dependencies: - psutil hard-pinned in requirements.txt (consumed by the stats heartbeat; no graceful fallback — we want memory metrics in all environments). Race-tests baseline preserved: 5 pass / 4 fail (same as prod). The 4 failures are the documented get_or_create_session auto-create-after-delete behaviour, unrelated to this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply formatter fixes flagged by CI's pre-commit run. Eleven files touched: four of them inside this PR's scope (generate_utils/openai_endpoint_utils.py, session/sessions.py, session/linear_trajectory.py, ray/rollout.py — single-line/spacing fixes only) and seven pre-existing style-debt files elsewhere in the repo that pre-commit's --all-files mode picked up. All changes are formatter output, no semantic edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On finish_reason=="abort", append a provisional (tail-only) record, set x-sglang-aborted header, and return early without committing token state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Base automatically changed from
feat/session-server-multi-backend-vanilla
to
prod
June 10, 2026 22:05
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wires the session-server abort path into the provisional-record protocol. Inside the refactored
setup_session_routesPhase 3 (under the session lock, after thenum_assistantmismatch guard), whenchoice.get("finish_reason") == "abort":SessionRecordviasession.append_provisional_record(...),x-sglang-aborted: 1response header (the only litellm-proof abort signal, since litellm remapsaborttostop),backend.build_proxy_response(result).This lets the agent's abort-retry re-issue the same turn from the prior checkpoint; a successful retry supersedes the provisional, and if retries are exhausted it remains as the honest final ABORTED turn.
Base
Based on
feat/session-server-multi-backend-vanilla(PR #33, Richard Fan's session-server multi-backend refactor), because the abort hunk targets the refactoredsetup_session_routes/build_proxy_responsepath — not prod.Relationships
x-sglang-abortedheader stub with the full provisional-record behavior (preserving the header). Does not close Abort logic for harbor trajectories #28.append_provisional_recordmechanism shipped in feat(session): provisional-record protocol for aborted turns #39 (the standaloneLinearTrajectorychange + tests).