Skip to content

feat(session): wire aborted turns to provisional record#40

Open
nightlessbaron wants to merge 3 commits into
prodfrom
feat/session-abort-provisional-wiring
Open

feat(session): wire aborted turns to provisional record#40
nightlessbaron wants to merge 3 commits into
prodfrom
feat/session-abort-provisional-wiring

Conversation

@nightlessbaron

Copy link
Copy Markdown

What

Wires the session-server abort path into the provisional-record protocol. Inside the refactored setup_session_routes Phase 3 (under the session lock, after the num_assistant mismatch guard), when choice.get("finish_reason") == "abort":

  • append a provisional (tail-only, no token-state commit) SessionRecord via session.append_provisional_record(...),
  • set the x-sglang-aborted: 1 response header (the only litellm-proof abort signal, since litellm remaps abort to stop),
  • return early via backend.build_proxy_response(result).

This lets the agent's abort-retry re-issue the same turn from the prior checkpoint; a successful retry supersedes the provisional, and if retries are exhausted it remains as the honest final ABORTED turn.

Base

Based on feat/session-server-multi-backend-vanilla (PR #33, Richard Fan's session-server multi-backend refactor), because the abort hunk targets the refactored setup_session_routes / build_proxy_response path — not prod.

Relationships

rmfan and others added 3 commits June 9, 2026 15:12
Spawn N independent vanilla session-server processes on the SGLang
gateway node; callee-side random pick per session. Adds structured
per-request observability (chat_start / chat_done with timing buckets
+ token counts) and a per-worker stats heartbeat for memory/throughput.

This PR is the safe (multi-backend + logging) subset of the original
PR #33 design. The lock-restoration + DELETE cancellation channel +
SessionStateConflictError 409 surface — which originally shipped in
this PR — have been split out to a follow-up so they can be reviewed
and landed independently. See the lock-restore PR for that work.

Session-server changes:
- Spawn N independent backends + callee-side pick (miles/ray/rollout.py,
  miles/utils/arguments.py)
- chat_start / chat_done structured logs with req_id (uuid8) and
  timing buckets (lock_wait_ms, tokenize_in_ms, proxy_elapsed_ms,
  tokenize_out_ms, total_ms, inflight_now, prompt_tokens,
  completion_tokens). One INFO log per request, success or short-circuit.
- Per-worker _stats (reqs_total, turns_completed, inflight) +
  _worker_stats registry keyed by session_server_port. Background
  _stats_logger_loop in session_server.py emits 30s heartbeat with
  rss_mb / vms_mb via psutil.
- pid=<n> prefix on every record in each subprocess via
  logging.basicConfig in run_session_server (force=True overrides
  ray-set handlers).
- Enriched state_changed_during_proxy warning with worker_port,
  inflight_chat_count, caller_request_id (x-request-id),
  proxy_elapsed_ms — grep-correlatable with Layer-1 client retries.
- prepare_pretokenized + update_pretokenized_state run in
  asyncio.to_thread so the event loop isn't blocked by sync
  TITO tokenizer calls (independent perf win, ~40% off p99 in
  microbench).
- proxy_transport_error warning includes url + elapsed_ms.

Dependencies:
- psutil hard-pinned in requirements.txt (consumed by the stats
  heartbeat; no graceful fallback — we want memory metrics in all
  environments).

Race-tests baseline preserved: 5 pass / 4 fail (same as prod). The
4 failures are the documented get_or_create_session
auto-create-after-delete behaviour, unrelated to this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply formatter fixes flagged by CI's pre-commit run. Eleven files
touched: four of them inside this PR's scope
(generate_utils/openai_endpoint_utils.py, session/sessions.py,
session/linear_trajectory.py, ray/rollout.py — single-line/spacing
fixes only) and seven pre-existing style-debt files elsewhere in the
repo that pre-commit's --all-files mode picked up. All changes are
formatter output, no semantic edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On finish_reason=="abort", append a provisional (tail-only) record, set x-sglang-aborted header, and return early without committing token state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@nightlessbaron nightlessbaron requested a review from a team as a code owner June 10, 2026 04:39
Base automatically changed from feat/session-server-multi-backend-vanilla to prod June 10, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants