feat: Responses API streaming, WebSocket mode, and reasoning token support by jhaynie · Pull Request #9 · agentuity/llmproxy

jhaynie · 2026-04-16T02:37:57Z

Summary

Fix stream_options.include_usage being incorrectly injected for Responses API requests (was causing 400 errors)
Add dedicated Responses API streaming extractor with usage extraction from response.completed events
Add WebSocket mode for the Responses API with zero-dependency adapter pattern
Fix reasoning token visibility across all extraction paths

Changes

Bug Fix: `stream_options` on Responses API

The proxy was injecting stream_options.include_usage into Responses API streaming requests, which don't support that parameter. Fixed by detecting API type before injection and skipping for Responses API.

Responses API Streaming Extractor

New ResponsesStreamingExtractor that understands the Responses API SSE event format (response.created, response.output_text.delta, response.completed, etc.). Extracts usage, model, cache tokens, and reasoning tokens from response.completed events. The StreamingMultiAPIExtractor now dispatches to the correct extractor based on api_type in the request context.

WebSocket Mode (Adapter Pattern)

Implements persistent WebSocket connections for multi-turn Responses API workflows. Uses a zero-dependency adapter pattern — the library defines WSConn, WSUpgrader, and WSDialer interfaces that consumers implement with their preferred WS library (gorilla, nhooyr, etc.). gorilla's *websocket.Conn satisfies WSConn directly.

Features:

Bidirectional relay with sync.Once close coordination
Per-turn billing via WSBillingCallback
Model prefix stripping in response.create messages
Opt-in via WithAutoRouterWebSocket(upgrader, dialer)

Reasoning Token Consistency

Previously only 1 of 5 extraction paths stored reasoning_tokens in metadata. Now all paths consistently expose it via meta.Custom["reasoning_tokens"]:

Non-streaming Chat Completions
Non-streaming Responses API
Streaming Chat Completions
Streaming Responses API
WebSocket

Documentation

Updated DESIGN.md with WebSocket mode section, flow diagram, and gorilla example
Updated README.md with WebSocket section and complete Go + Python examples

Test Coverage

60+ new tests across 9 test files
Responses API streaming: 13 tests (lifecycle, usage, cache, reasoning, function calls, errors, passthrough)
WebSocket: 15 tests (relay, multi-turn, billing, close handling, prefix stripping)
Reasoning tokens: 12 tests across all extraction paths
MultiAPI dispatch: 3 tests (context routing, fallback)
stream_options fix: 2 subtests (path-based, body-based detection)

All tests pass: go test ./... ✅

Summary by CodeRabbit

New Features
- Opt-in WebSocket mode for persistent multi-turn workflows with per-turn billing and callbacks; Responses API now supports both HTTP streaming and WebSocket.
Enhancements
- Improved usage extraction to include reasoning-token counts and cached-token reporting.
- Router no longer injects stream_options.include_usage for Responses/other non-ChatCompletions providers.
Documentation
- README and design docs updated with WebSocket Mode examples and streaming behavior guidance.
Tests
- Extensive new tests for streaming, WebSocket relay, usage extraction, and billing.

…n support - Fix stream_options.include_usage injection for Responses API requests - Add dedicated ResponsesStreamingExtractor for SSE streaming usage extraction - Add WebSocket mode with zero-dependency adapter pattern (WSConn, WSUpgrader, WSDialer) - Implement bidirectional relay with per-turn billing and model prefix stripping - Add consistent reasoning_tokens extraction across all 5 extraction paths - Update DESIGN.md and README.md with WebSocket docs and gorilla example - Add 60+ new tests covering streaming, WebSocket, and reasoning tokens

coderabbitai · 2026-04-16T02:38:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13790c8b-73b4-4fd9-91dc-8acf5da48d37

📥 Commits

Reviewing files that changed from the base of the PR and between e939cda and b6e0862.

📒 Files selected for processing (1)

DESIGN.md

✅ Files skipped from review due to trivial changes (1)

DESIGN.md

📝 Walkthrough

Walkthrough

Adds a WebSocket mode for forwarding OpenAI Responses (client↔upstream) with per‑turn billing callbacks, SSE and WebSocket streaming extractors (including reasoning/cache token capture), WebSocket abstractions/helpers, provider WebSocket URL resolution, model rewriting, ServeHTTP WS detection, and extensive tests and docs updates.

Changes

Cohort / File(s)	Summary
Documentation `DESIGN.md`, `README.md`	Documented WebSocket Mode, added examples (Go + Python), updated streaming usage guidance distinguishing Chat Completions vs Responses, and described WS billing callback and injector behavior.
AutoRouter core & options `autorouter.go`	Added `wsUpgrader`, `wsDialer`, `wsBillingCallback` fields; new options `WithAutoRouterWebSocket` and `WithAutoRouterWSBillingCallback`; WS upgrade detection; gate stream_options injection by API type.
AutoRouter WebSocket relay `autorouter_websocket.go`, `autorouter_websocket_test.go`, `autorouter_test.go`	Implemented `ForwardWebSocket`, client↔upstream bidirectional relay, initial `response.create` validation/model rewriting, metadata enrichment (`api_type=responses`/provider/model), usage extraction, per‑turn billing callback, connection lifecycle handling, and comprehensive WS tests (relay, billing, edge cases).
WebSocket abstractions & parsing `websocket.go`, `websocket_test.go`	Added `WSConn`/`WSUpgrader`/`WSDialer` interfaces, `WebSocketCapableProvider` interface, WS message structs, `ParseWSMessage`, and `ExtractWSUsage` (maps Responses `response.completed` usage incl. cached/reasoning tokens).
OpenAI provider WebSocket support `providers/openai/provider.go`, `providers/openai_compatible/provider.go`	Added compile-time assertions for WebSocket capability; implemented `WebSocketURL(meta)` delegation on `openai_compatible.Provider`.
OpenAI-compatible resolver & WebSocket URL `providers/openai_compatible/websocket.go`, `providers/openai_compatible/websocket_test.go`, `providers/openai_compatible/resolver.go`	Added `Resolver.WebSocketURL` converting http(s)→ws(s) and appending `/v1/responses`; normalized base URL parsing to trim trailing slashes and strip terminal `/v1`; unit tests for URL behavior.
Responses SSE streaming extractor `providers/openai_compatible/responses_streaming_extractor.go`, `providers/openai_compatible/responses_streaming_extractor_test.go`	New `ResponsesStreamingExtractor` that handles SSE line-by-line passthrough, parses response events, accumulates usage across events, and exposes metadata (including reasoning/cache tokens); extensive streaming tests.
Streaming types & usage extraction `streaming.go`, `streaming_test.go`	Added Responses SSE types/events, `ParseResponsesSSEEvent`, `ExtractUsageFromResponsesEvent`, and `StreamingUsage.ReasoningTokens`; tests for parsing and usage extraction.
Extractor enhancements & dispatch `providers/openai_compatible/streaming_extractor.go`, `providers/openai_compatible/multiapi.go`, `providers/openai_compatible/extractor.go`, `providers/openai_compatible/responses_extractor.go`, `providers/openai_compatible/*_test.go`	Mapped completion/response reasoning tokens and cached token details into `meta.Custom`; `StreamingMultiAPIExtractor` now routes Responses SSE to the new extractor; added/updated tests validating reasoning/cache extraction and dispatch.
Tests: integration & unit `autorouter_websocket_test.go`, `websocket_test.go`, `streaming_test.go`, `providers/.../*_test.go`	Added many unit/integration tests covering WS relay behavior, SSE/WebSocket parsing, usage/reasoning/cache token propagation, billing callback invocation, model prefix stripping, and ServeHTTP WS detection.

🚥 Pre-merge checks | ✅ 1

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

DESIGN.md (1)

662-695: Add language specifier to the flow diagram code block.

The static analysis tool flagged this fenced code block as missing a language specifier.

📝 Proposed fix

-```
+```text
 +------------------+        +------------------+        +------------------+

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@DESIGN.md` around lines 662 - 695, The fenced diagram in the "WebSocket Flow"
section is missing a language specifier causing static analysis to flag it;
update the opening fence for the diagram (the triple backticks that begin the
code block under "WebSocket Flow") to include a language like "text" (i.e.,
change ``` to ```text) so the block is explicitly typed; ensure any other
adjacent fenced diagram blocks in that section are similarly updated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@providers/openai_compatible/websocket_test.go`:
- Around line 51-63: The WebSocket URL builder currently duplicates "/v1" when a
BaseURL already ends with that suffix; update NewResolver (or the resolver
initialization) to normalize the provided base URL by removing any trailing
"/v1" or "/v1/" before storing it, so that subsequent calls to
r.WebSocketURL(llmproxy.BodyMetadata{}) append a single "/v1/responses"; locate
the normalization logic in NewResolver and ensure it trims a trailing "/v1"
(case-sensitive) and any extra slash, or add a small helper (e.g.,
normalizeBaseURL) used by NewResolver and referenced by WebSocketURL to prevent
double "/v1" segments.

In `@providers/openai_compatible/websocket.go`:
- Line 27: The ws URL builder currently always appends "v1/responses" via
u.JoinPath("v1", "responses"), causing a duplicate /v1 when BaseURL already ends
with /v1; modify the logic that builds the websocket path to check the parsed
URL's Path (e.g., u.Path or url.Path) and if it already has a trailing "/v1"
(use strings.HasSuffix(u.Path, "/v1") or normalize trailing slashes) then call
u.JoinPath("responses") (or join only "responses"), otherwise call
u.JoinPath("v1", "responses"); ensure you normalize slashes so neither double
nor missing slashes occur and keep the return signature the same.

In `@README.md`:
- Around line 231-233: The README example uses an unsafe CheckOrigin that
unconditionally returns true; update the gorillaUpgrader/websocket.Upgrader
CheckOrigin implementation to validate the request Origin header against a
whitelist of trusted origins (or use same-origin checks) before allowing the
upgrade. Replace the unconditional return true with logic in the CheckOrigin
callback that reads r.Header.Get("Origin") and compares it to a configured list
(or derives allowed origin from the request) and only returns true for matches;
mention using a configurable trustedOrigins list and the
gorillaUpgrader/websocket.Upgrader symbols so readers can copy a secure pattern
for production.

---

Nitpick comments:
In `@DESIGN.md`:
- Around line 662-695: The fenced diagram in the "WebSocket Flow" section is
missing a language specifier causing static analysis to flag it; update the
opening fence for the diagram (the triple backticks that begin the code block
under "WebSocket Flow") to include a language like "text" (i.e., change ``` to
```text) so the block is explicitly typed; ensure any other adjacent fenced
diagram blocks in that section are similarly updated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b82b7fc-75fd-4fc9-a320-547e4cac3f59

📥 Commits

Reviewing files that changed from the base of the PR and between 94d029e and 01654a4.

📒 Files selected for processing (23)

DESIGN.md
README.md
autorouter.go
autorouter_test.go
autorouter_websocket.go
autorouter_websocket_test.go
providers/openai/provider.go
providers/openai_compatible/extractor.go
providers/openai_compatible/extractor_test.go
providers/openai_compatible/multiapi.go
providers/openai_compatible/provider.go
providers/openai_compatible/responses_extractor.go
providers/openai_compatible/responses_streaming_extractor.go
providers/openai_compatible/responses_streaming_extractor_test.go
providers/openai_compatible/responses_test.go
providers/openai_compatible/streaming_extractor.go
providers/openai_compatible/streaming_extractor_test.go
providers/openai_compatible/websocket.go
providers/openai_compatible/websocket_test.go
streaming.go
streaming_test.go
websocket.go
websocket_test.go

📜 Review details

🧰 Additional context used

🪛 markdownlint-cli2 (0.22.0)

DESIGN.md

[warning] 664-664: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (60)

providers/openai_compatible/responses_extractor.go (1)

61-63: Good reasoning-token extraction guardrails.

Nil-check + positive-value guard is consistent with existing metadata extraction and avoids noisy zero-value fields.

providers/openai_compatible/extractor.go (1)

55-57: Reasoning token propagation looks correct.

This keeps non-streaming chat-completions extraction aligned with the other extractor paths.

providers/openai_compatible/streaming_extractor.go (1)

176-178: Streaming reasoning-token metadata wiring is solid.

The field is emitted only when available, alongside other accumulated usage metadata.

providers/openai/provider.go (1)

13-17: Nice compile-time interface conformance check.

This is a good guard to prevent regressions in WebSocket-capable provider behavior.

providers/openai_compatible/multiapi.go (1)

82-106: Streaming extractor dispatch update looks good.

The dedicated Responses SSE path is now explicitly routed by api_type, with safe fallback behavior.

autorouter_test.go (1)

739-822: Great regression coverage for Responses streaming stream-options behavior.

Both path-based and body-based detection cases are covered and validate the intended non-injection behavior.

providers/openai_compatible/provider.go (1)

74-87: LGTM!

The WebSocketURL method implementation is clean and follows good Go patterns:

Proper nil check on resolver

Type assertion to optional interface for WebSocket capability

Clear error messages for both failure modes

providers/openai_compatible/extractor_test.go (1)

12-135: LGTM!

Comprehensive test coverage for reasoning token extraction:

Validates non-zero reasoning tokens are extracted and stored as int

Confirms zero reasoning tokens are omitted from metadata (avoiding noise)

Tests combined extraction of cache usage and reasoning tokens

providers/openai_compatible/streaming_extractor_test.go (1)

149-230: LGTM!

Well-structured streaming tests for reasoning token extraction that mirror the non-streaming tests. The SSE format is realistic and validates that reasoning tokens flow correctly through the streaming extraction path.

websocket_test.go (1)

1-100: LGTM!

Solid test coverage for WebSocket utilities:

ParseWSMessage tests cover various message types and error cases

ExtractWSUsage tests validate token extraction including cache and reasoning tokens

Malformed JSON error handling is properly tested

streaming_test.go (2)

217-260: LGTM!

Good extension of the OpenAI chunk usage extraction tests to cover reasoning tokens, including the combined cache + reasoning scenario.

611-808: LGTM!

Comprehensive test suite for Responses API SSE parsing:

Event type parsing (response.created, response.output_text.delta, response.completed)

Edge cases (empty input, [DONE] marker, malformed JSON)

Usage extraction with all token detail variants (cached, reasoning)

providers/openai_compatible/responses_test.go (2)

1049-1056: LGTM!

Good addition of explicit reasoning token verification in the Responses extractor test, ensuring consistency with other extraction paths.

1720-1825: LGTM!

Excellent test coverage for the streaming multi-API extractor dispatch logic:

Validates correct dispatch to Responses API extractor based on context

Validates dispatch to Chat Completions extractor

Verifies graceful fallback when request context is missing

autorouter.go (3)

250-270: LGTM!

Good fix for the Responses API streaming issue. Moving apiType detection before the stream_options modification and adding the apiType != APITypeResponses guard correctly prevents injecting stream_options.include_usage into Responses API requests, which would cause 400 errors.

427-434: LGTM!

Clean integration of WebSocket upgrade handling into ServeHTTP:

Checks all three conditions before routing to WebSocket handler

Properly guards error response with headerSent check (important since WebSocket upgrade may have already written headers)

512-516: LGTM!

The isWebSocketUpgrade helper correctly identifies WebSocket upgrade requests using case-insensitive header checks and strings.Contains to handle multi-value headers like Connection: keep-alive, upgrade.

DESIGN.md (5)

107-108: LGTM!

The WebSocket configuration options are well-documented and align with the AutoRouter struct fields shown in the context snippet (wsUpgrader, wsDialer, wsBillingCallback).

467-483: LGTM!

Clear documentation of the Responses API streaming format and the automatic stream_options skipping behavior. The SSE event examples accurately reflect the Responses API protocol.

542-576: LGTM!

The WebSocket adapter pattern is well-documented with clear interface definitions. The zero-dependency approach and gorilla/websocket compatibility notes are helpful for consumers.

580-646: LGTM!

The gorilla/websocket adapter example is practical and correctly demonstrates that *websocket.Conn satisfies WSConn directly while Upgrader and Dialer need thin wrappers.

697-715: LGTM!

Clear documentation of per-turn billing semantics and model prefix stripping behavior for WebSocket mode.

streaming.go (5)

91-97: LGTM!

The ReasoningTokens field addition to StreamingUsage is consistent with the PR objective to expose reasoning tokens across all extraction paths.

204-231: LGTM!

The Responses API streaming types correctly model the OpenAI Responses SSE event structure, including nested token details for cached and reasoning tokens.

247-263: LGTM!

ParseResponsesSSEEvent follows the same pattern as ParseOpenAISSEEvent — trimming whitespace, handling [DONE] with ErrStreamComplete, and unmarshaling JSON.

286-288: LGTM!

Reasoning tokens are correctly extracted from CompletionTokensDetails when present and greater than zero.

334-365: LGTM!

ExtractUsageFromResponsesEvent correctly extracts usage only from response.completed events, maps OpenAI's input_tokens/output_tokens naming to the canonical PromptTokens/CompletionTokens, and handles optional cache and reasoning token details.

autorouter_websocket_test.go (11)

18-78: LGTM!

The mockWSConn implementation correctly simulates bidirectional WebSocket communication with proper close coordination using atomic.Bool and channels. The closeFromPeer method ensures both ends close when either side disconnects.

80-111: LGTM!

The mockWSUpgrader and mockWSDialer correctly implement the WSUpgrader and WSDialer interfaces for test purposes, capturing dialed URLs and headers for verification.

126-158: LGTM!

The wsTestProvider helper creates a properly configured mock WebSocket-capable provider with sensible defaults for parsing, enriching, and URL resolution.

198-235: LGTM!

The mustReadFrame and mustReadError helpers include appropriate timeouts (2 seconds) to prevent test hangs while providing clear failure messages.

251-282: LGTM!

TestForwardWebSocket_BasicRelay provides good end-to-end coverage of the WebSocket relay flow, validating message forwarding in both directions and proper cleanup on close.

284-356: LGTM!

Tests for usage extraction, cache usage, and reasoning tokens correctly validate that the billing callback receives properly populated ResponseMetadata with token counts and custom fields.

358-403: LGTM!

TestForwardWebSocket_ModelPrefixStripping and TestForwardWebSocket_MultiTurn validate critical behaviors: provider prefix removal from model names and correct turn counting across multiple request/response cycles.

405-437: LGTM!

TestForwardWebSocket_BillingCallback correctly tests the integration with BillingCalculator, verifying that costs are computed and passed to the callback.

439-520: LGTM!

Good coverage of edge cases: client close, upstream close, error event passthrough, missing WebSocket configuration, and non-WebSocket-capable provider detection.

522-569: LGTM!

TestForwardWebSocket_PassthroughNonCreateMessages validates byte-for-byte passthrough of non-response.create messages, and TestServeHTTP_WebSocketDetection correctly tests the WebSocket upgrade detection path.

571-609: LGTM!

TestServeHTTP_NonWebSocketUnchanged validates that regular HTTP POST requests are unaffected when WebSocket mode is configured, ensuring no regression in normal request handling.

providers/openai_compatible/responses_streaming_extractor.go (4)

15-35: LGTM!

Clean composition pattern embedding ResponsesExtractor and proper dispatch between streaming and non-streaming paths based on content type.

37-79: LGTM!

The non-streaming fallback correctly uses TeeReader to extract metadata while simultaneously writing the response to the client. The 512KB buffer and chunked flush pattern match the codebase conventions mentioned in the DESIGN.md.

81-132: LGTM!

SSE parsing correctly sets streaming headers, uses appropriately sized scanner buffers (64KB initial, 1MB max), and handles the [DONE] marker and parse errors gracefully without breaking the stream.

134-179: Consider extracting reasoning_tokens once instead of twice.

Reasoning tokens are extracted both at line 149-151 (from the response object during event processing) and again at lines 174-176 (from accumulatedUsage). While this works correctly (the second assignment will overwrite), it's slightly redundant.

However, this redundancy ensures both paths are covered if the response structure varies, so this is acceptable as-is.

providers/openai_compatible/responses_streaming_extractor_test.go (5)

14-29: LGTM!

Clean test helper that encapsulates response setup, extraction, and result capture for reuse across test cases.

31-66: LGTM!

Comprehensive lifecycle test covering the full Responses API event sequence from response.created through response.completed, validating both passthrough accuracy and metadata extraction.

68-122: LGTM!

Good coverage of usage extraction variations including basic token counts, cache usage with input_tokens_details.cached_tokens, and reasoning tokens with output_tokens_details.reasoning_tokens.

124-204: LGTM!

Edge case tests are thorough: function call streaming, error event passthrough, no response.completed event, empty stream with only [DONE], and malformed event handling that continues forwarding.

206-278: LGTM!

Good test coverage for non-streaming fallback, IsStreamingResponse content type detection, event: prefix handling in SSE, and byte-accurate passthrough including comment lines (: ping).

autorouter_websocket.go (8)

17-41: LGTM!

The initial setup correctly validates WebSocket configuration, upgrades the connection, reads the first message, and validates it's a response.create message before proceeding.

43-70: LGTM!

Provider detection correctly reuses the existing detector and modelProviderLookup infrastructure, with proper fallback handling and WebSocket capability check via type assertion.

72-118: LGTM!

Model prefix stripping, metadata parsing, URL resolution, header cloning, and request enrichment are all handled correctly before dialing the upstream WebSocket.

120-131: LGTM!

The sync.Once-guarded closeBoth function ensures both connections are closed exactly once, preventing double-close errors and ensuring proper cleanup regardless of which relay goroutine exits first.

133-180: LGTM!

The model state is properly protected with sync.RWMutex for concurrent access from both relay goroutines. The error channel pattern correctly collects errors from both goroutines, filtering out expected close errors.

183-211: LGTM!

relayClientToUpstream correctly handles model prefix stripping for subsequent response.create messages and passes through all other messages byte-for-byte.

213-261: LGTM!

relayUpstreamToClient correctly extracts usage from response.completed events, increments the turn counter, populates ResponseMetadata with usage and custom fields (cache_usage, reasoning_tokens), calculates billing, and invokes the callback. Messages are forwarded regardless of usage extraction.

264-289: LGTM!

Helper functions are simple and correct: rewriteWSCreateModel preserves all fields while updating model, cloneHeader creates a deep copy, and isWSRelayCloseError correctly identifies expected close conditions.

websocket.go (5)

10-17: LGTM!

RFC 6455 message type constants are correctly defined with standard values. These match gorilla/websocket constants, ensuring seamless interoperability.

19-52: LGTM!

Clean interface definitions that enable the zero-dependency adapter pattern. The WebSocketCapableProvider extension of Provider is a proper interface composition pattern.

54-70: LGTM!

WSMessage captures the essential fields needed for routing decisions, and ParseWSMessage correctly preserves the raw JSON for later rewriting while extracting commonly accessed fields.

72-98: LGTM!

The response structure types correctly model the OpenAI Responses API WebSocket payload format, including the nested envelope pattern where usage can appear at either top level or under response.

100-135: LGTM!

ExtractWSUsage correctly handles both usage locations (top-level and nested under response), extracts cache and reasoning token details when present and non-zero, and returns nil, nil for non-response.completed events as documented.

…n examples - NewResolver now strips trailing /v1 from base URLs to prevent double /v1/v1/responses paths in both HTTP and WebSocket URL resolution - README and DESIGN.md examples use origin whitelist instead of open CheckOrigin - Fix bare fenced code block in DESIGN.md WebSocket flow diagram

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

DESIGN.md (1)

483-506: Clarify stream_options injection scope to avoid mixed guidance.

Line 483 says Responses API skips stream_options.include_usage, but Lines 496-506 still describe auto-injection in unconditional terms. Please scope that section to non-Responses streaming (e.g., Chat Completions/OpenAI-compatible).

Suggested doc fix

-When `BillingCalculator` is configured and the request has `stream: true`, the proxy automatically injects:
+When `BillingCalculator` is configured and the request has `stream: true`, the proxy may inject:
@@
-This ensures OpenAI returns token usage in the streaming response for billing calculation.
+This is applied for APIs that require it (e.g., Chat Completions/OpenAI-compatible streaming).
+For Responses API streaming, injection is skipped because usage is delivered in `response.completed`.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@DESIGN.md` around lines 483 - 506, Clarify the doc text to scope the
auto-injection behavior: state that the proxy auto-injects {"stream": true,
"stream_options": {"include_usage": true"}} only for non-Responses streaming
endpoints (e.g., OpenAI-compatible Chat Completions) when BillingCalculator is
configured and the request has stream: true, and explicitly note that Responses
API requests (which always include usage in response.completed and in Anthropic
message_start/message_delta events) are excluded from this injection; update
references to stream_options.include_usage, BillingCalculator, Responses API,
and the proxy auto-injection paragraph to reflect this scoped behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN.md`:
- Around line 474-475: The doc incorrectly references MultiAPIExtractor; update
the text to use the exact class name introduced in code,
StreamingMultiAPIExtractor, so readers can directly map the design note to the
implementation (replace "MultiAPIExtractor" with "StreamingMultiAPIExtractor" in
the sentence about dispatching based on request context and the
response.completed event).

---

Nitpick comments:
In `@DESIGN.md`:
- Around line 483-506: Clarify the doc text to scope the auto-injection
behavior: state that the proxy auto-injects {"stream": true, "stream_options":
{"include_usage": true"}} only for non-Responses streaming endpoints (e.g.,
OpenAI-compatible Chat Completions) when BillingCalculator is configured and the
request has stream: true, and explicitly note that Responses API requests (which
always include usage in response.completed and in Anthropic
message_start/message_delta events) are excluded from this injection; update
references to stream_options.include_usage, BillingCalculator, Responses API,
and the proxy auto-injection paragraph to reflect this scoped behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38d80730-e1b3-4ccd-a578-bb2a19ceb4da

📥 Commits

Reviewing files that changed from the base of the PR and between 01654a4 and e939cda.

📒 Files selected for processing (4)

DESIGN.md
README.md
providers/openai_compatible/resolver.go
providers/openai_compatible/websocket_test.go

✅ Files skipped from review due to trivial changes (2)

providers/openai_compatible/resolver.go
providers/openai_compatible/websocket_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

README.md

📜 Review details

🔇 Additional comments (1)

DESIGN.md (1)

542-722: WebSocket design section is solid and implementation-aligned.

The adapter interfaces, relay flow, per-turn billing lifecycle, and close semantics are clear and consistent with the implementation snippets.

…jection docs - Correct MultiAPIExtractor → StreamingMultiAPIExtractor in usage extraction docs - Clarify that stream_options injection only applies to Chat Completions, not Responses API, Anthropic, Bedrock, or Google AI (which include usage natively)

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread providers/openai_compatible/websocket_test.go

Comment thread providers/openai_compatible/websocket.go

Comment thread README.md Outdated

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread DESIGN.md Outdated

jhaynie merged commit 1bd7320 into main Apr 16, 2026
2 checks passed

jhaynie deleted the feat/responses-api-streaming-websocket branch April 16, 2026 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Responses API streaming, WebSocket mode, and reasoning token support#9

feat: Responses API streaming, WebSocket mode, and reasoning token support#9
jhaynie merged 3 commits intomainfrom
feat/responses-api-streaming-websocket

jhaynie commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jhaynie commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Bug Fix: stream_options on Responses API

Responses API Streaming Extractor

WebSocket Mode (Adapter Pattern)

Reasoning Token Consistency

Documentation

Test Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jhaynie commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Bug Fix: `stream_options` on Responses API

coderabbitai bot commented Apr 16, 2026 •

edited

Loading