Skip to content

feat: Responses API streaming, WebSocket mode, and reasoning token support#9

Merged
jhaynie merged 3 commits intomainfrom
feat/responses-api-streaming-websocket
Apr 16, 2026
Merged

feat: Responses API streaming, WebSocket mode, and reasoning token support#9
jhaynie merged 3 commits intomainfrom
feat/responses-api-streaming-websocket

Conversation

@jhaynie
Copy link
Copy Markdown
Member

@jhaynie jhaynie commented Apr 16, 2026

Summary

  • Fix stream_options.include_usage being incorrectly injected for Responses API requests (was causing 400 errors)
  • Add dedicated Responses API streaming extractor with usage extraction from response.completed events
  • Add WebSocket mode for the Responses API with zero-dependency adapter pattern
  • Fix reasoning token visibility across all extraction paths

Changes

Bug Fix: stream_options on Responses API

The proxy was injecting stream_options.include_usage into Responses API streaming requests, which don't support that parameter. Fixed by detecting API type before injection and skipping for Responses API.

Responses API Streaming Extractor

New ResponsesStreamingExtractor that understands the Responses API SSE event format (response.created, response.output_text.delta, response.completed, etc.). Extracts usage, model, cache tokens, and reasoning tokens from response.completed events. The StreamingMultiAPIExtractor now dispatches to the correct extractor based on api_type in the request context.

WebSocket Mode (Adapter Pattern)

Implements persistent WebSocket connections for multi-turn Responses API workflows. Uses a zero-dependency adapter pattern — the library defines WSConn, WSUpgrader, and WSDialer interfaces that consumers implement with their preferred WS library (gorilla, nhooyr, etc.). gorilla's *websocket.Conn satisfies WSConn directly.

Features:

  • Bidirectional relay with sync.Once close coordination
  • Per-turn billing via WSBillingCallback
  • Model prefix stripping in response.create messages
  • Opt-in via WithAutoRouterWebSocket(upgrader, dialer)

Reasoning Token Consistency

Previously only 1 of 5 extraction paths stored reasoning_tokens in metadata. Now all paths consistently expose it via meta.Custom["reasoning_tokens"]:

  • Non-streaming Chat Completions
  • Non-streaming Responses API
  • Streaming Chat Completions
  • Streaming Responses API
  • WebSocket

Documentation

  • Updated DESIGN.md with WebSocket mode section, flow diagram, and gorilla example
  • Updated README.md with WebSocket section and complete Go + Python examples

Test Coverage

  • 60+ new tests across 9 test files
  • Responses API streaming: 13 tests (lifecycle, usage, cache, reasoning, function calls, errors, passthrough)
  • WebSocket: 15 tests (relay, multi-turn, billing, close handling, prefix stripping)
  • Reasoning tokens: 12 tests across all extraction paths
  • MultiAPI dispatch: 3 tests (context routing, fallback)
  • stream_options fix: 2 subtests (path-based, body-based detection)

All tests pass: go test ./...

Summary by CodeRabbit

  • New Features

    • Opt-in WebSocket mode for persistent multi-turn workflows with per-turn billing and callbacks; Responses API now supports both HTTP streaming and WebSocket.
  • Enhancements

    • Improved usage extraction to include reasoning-token counts and cached-token reporting.
    • Router no longer injects stream_options.include_usage for Responses/other non-ChatCompletions providers.
  • Documentation

    • README and design docs updated with WebSocket Mode examples and streaming behavior guidance.
  • Tests

    • Extensive new tests for streaming, WebSocket relay, usage extraction, and billing.

…n support

- Fix stream_options.include_usage injection for Responses API requests
- Add dedicated ResponsesStreamingExtractor for SSE streaming usage extraction
- Add WebSocket mode with zero-dependency adapter pattern (WSConn, WSUpgrader, WSDialer)
- Implement bidirectional relay with per-turn billing and model prefix stripping
- Add consistent reasoning_tokens extraction across all 5 extraction paths
- Update DESIGN.md and README.md with WebSocket docs and gorilla example
- Add 60+ new tests covering streaming, WebSocket, and reasoning tokens
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13790c8b-73b4-4fd9-91dc-8acf5da48d37

📥 Commits

Reviewing files that changed from the base of the PR and between e939cda and b6e0862.

📒 Files selected for processing (1)
  • DESIGN.md
✅ Files skipped from review due to trivial changes (1)
  • DESIGN.md

📝 Walkthrough

Walkthrough

Adds a WebSocket mode for forwarding OpenAI Responses (client↔upstream) with per‑turn billing callbacks, SSE and WebSocket streaming extractors (including reasoning/cache token capture), WebSocket abstractions/helpers, provider WebSocket URL resolution, model rewriting, ServeHTTP WS detection, and extensive tests and docs updates.

Changes

Cohort / File(s) Summary
Documentation
DESIGN.md, README.md
Documented WebSocket Mode, added examples (Go + Python), updated streaming usage guidance distinguishing Chat Completions vs Responses, and described WS billing callback and injector behavior.
AutoRouter core & options
autorouter.go
Added wsUpgrader, wsDialer, wsBillingCallback fields; new options WithAutoRouterWebSocket and WithAutoRouterWSBillingCallback; WS upgrade detection; gate stream_options injection by API type.
AutoRouter WebSocket relay
autorouter_websocket.go, autorouter_websocket_test.go, autorouter_test.go
Implemented ForwardWebSocket, client↔upstream bidirectional relay, initial response.create validation/model rewriting, metadata enrichment (api_type=responses/provider/model), usage extraction, per‑turn billing callback, connection lifecycle handling, and comprehensive WS tests (relay, billing, edge cases).
WebSocket abstractions & parsing
websocket.go, websocket_test.go
Added WSConn/WSUpgrader/WSDialer interfaces, WebSocketCapableProvider interface, WS message structs, ParseWSMessage, and ExtractWSUsage (maps Responses response.completed usage incl. cached/reasoning tokens).
OpenAI provider WebSocket support
providers/openai/provider.go, providers/openai_compatible/provider.go
Added compile-time assertions for WebSocket capability; implemented WebSocketURL(meta) delegation on openai_compatible.Provider.
OpenAI-compatible resolver & WebSocket URL
providers/openai_compatible/websocket.go, providers/openai_compatible/websocket_test.go, providers/openai_compatible/resolver.go
Added Resolver.WebSocketURL converting http(s)→ws(s) and appending /v1/responses; normalized base URL parsing to trim trailing slashes and strip terminal /v1; unit tests for URL behavior.
Responses SSE streaming extractor
providers/openai_compatible/responses_streaming_extractor.go, providers/openai_compatible/responses_streaming_extractor_test.go
New ResponsesStreamingExtractor that handles SSE line-by-line passthrough, parses response events, accumulates usage across events, and exposes metadata (including reasoning/cache tokens); extensive streaming tests.
Streaming types & usage extraction
streaming.go, streaming_test.go
Added Responses SSE types/events, ParseResponsesSSEEvent, ExtractUsageFromResponsesEvent, and StreamingUsage.ReasoningTokens; tests for parsing and usage extraction.
Extractor enhancements & dispatch
providers/openai_compatible/streaming_extractor.go, providers/openai_compatible/multiapi.go, providers/openai_compatible/extractor.go, providers/openai_compatible/responses_extractor.go, providers/openai_compatible/*_test.go
Mapped completion/response reasoning tokens and cached token details into meta.Custom; StreamingMultiAPIExtractor now routes Responses SSE to the new extractor; added/updated tests validating reasoning/cache extraction and dispatch.
Tests: integration & unit
autorouter_websocket_test.go, websocket_test.go, streaming_test.go, providers/.../*_test.go
Added many unit/integration tests covering WS relay behavior, SSE/WebSocket parsing, usage/reasoning/cache token propagation, billing callback invocation, model prefix stripping, and ServeHTTP WS detection.
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
DESIGN.md (1)

662-695: Add language specifier to the flow diagram code block.

The static analysis tool flagged this fenced code block as missing a language specifier.

📝 Proposed fix
-```
+```text
 +------------------+        +------------------+        +------------------+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN.md` around lines 662 - 695, The fenced diagram in the "WebSocket Flow"
section is missing a language specifier causing static analysis to flag it;
update the opening fence for the diagram (the triple backticks that begin the
code block under "WebSocket Flow") to include a language like "text" (i.e.,
change ``` to ```text) so the block is explicitly typed; ensure any other
adjacent fenced diagram blocks in that section are similarly updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@providers/openai_compatible/websocket_test.go`:
- Around line 51-63: The WebSocket URL builder currently duplicates "/v1" when a
BaseURL already ends with that suffix; update NewResolver (or the resolver
initialization) to normalize the provided base URL by removing any trailing
"/v1" or "/v1/" before storing it, so that subsequent calls to
r.WebSocketURL(llmproxy.BodyMetadata{}) append a single "/v1/responses"; locate
the normalization logic in NewResolver and ensure it trims a trailing "/v1"
(case-sensitive) and any extra slash, or add a small helper (e.g.,
normalizeBaseURL) used by NewResolver and referenced by WebSocketURL to prevent
double "/v1" segments.

In `@providers/openai_compatible/websocket.go`:
- Line 27: The ws URL builder currently always appends "v1/responses" via
u.JoinPath("v1", "responses"), causing a duplicate /v1 when BaseURL already ends
with /v1; modify the logic that builds the websocket path to check the parsed
URL's Path (e.g., u.Path or url.Path) and if it already has a trailing "/v1"
(use strings.HasSuffix(u.Path, "/v1") or normalize trailing slashes) then call
u.JoinPath("responses") (or join only "responses"), otherwise call
u.JoinPath("v1", "responses"); ensure you normalize slashes so neither double
nor missing slashes occur and keep the return signature the same.

In `@README.md`:
- Around line 231-233: The README example uses an unsafe CheckOrigin that
unconditionally returns true; update the gorillaUpgrader/websocket.Upgrader
CheckOrigin implementation to validate the request Origin header against a
whitelist of trusted origins (or use same-origin checks) before allowing the
upgrade. Replace the unconditional return true with logic in the CheckOrigin
callback that reads r.Header.Get("Origin") and compares it to a configured list
(or derives allowed origin from the request) and only returns true for matches;
mention using a configurable trustedOrigins list and the
gorillaUpgrader/websocket.Upgrader symbols so readers can copy a secure pattern
for production.

---

Nitpick comments:
In `@DESIGN.md`:
- Around line 662-695: The fenced diagram in the "WebSocket Flow" section is
missing a language specifier causing static analysis to flag it; update the
opening fence for the diagram (the triple backticks that begin the code block
under "WebSocket Flow") to include a language like "text" (i.e., change ``` to
```text) so the block is explicitly typed; ensure any other adjacent fenced
diagram blocks in that section are similarly updated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b82b7fc-75fd-4fc9-a320-547e4cac3f59

📥 Commits

Reviewing files that changed from the base of the PR and between 94d029e and 01654a4.

📒 Files selected for processing (23)
  • DESIGN.md
  • README.md
  • autorouter.go
  • autorouter_test.go
  • autorouter_websocket.go
  • autorouter_websocket_test.go
  • providers/openai/provider.go
  • providers/openai_compatible/extractor.go
  • providers/openai_compatible/extractor_test.go
  • providers/openai_compatible/multiapi.go
  • providers/openai_compatible/provider.go
  • providers/openai_compatible/responses_extractor.go
  • providers/openai_compatible/responses_streaming_extractor.go
  • providers/openai_compatible/responses_streaming_extractor_test.go
  • providers/openai_compatible/responses_test.go
  • providers/openai_compatible/streaming_extractor.go
  • providers/openai_compatible/streaming_extractor_test.go
  • providers/openai_compatible/websocket.go
  • providers/openai_compatible/websocket_test.go
  • streaming.go
  • streaming_test.go
  • websocket.go
  • websocket_test.go
📜 Review details
🧰 Additional context used
🪛 markdownlint-cli2 (0.22.0)
DESIGN.md

[warning] 664-664: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (60)
providers/openai_compatible/responses_extractor.go (1)

61-63: Good reasoning-token extraction guardrails.

Nil-check + positive-value guard is consistent with existing metadata extraction and avoids noisy zero-value fields.

providers/openai_compatible/extractor.go (1)

55-57: Reasoning token propagation looks correct.

This keeps non-streaming chat-completions extraction aligned with the other extractor paths.

providers/openai_compatible/streaming_extractor.go (1)

176-178: Streaming reasoning-token metadata wiring is solid.

The field is emitted only when available, alongside other accumulated usage metadata.

providers/openai/provider.go (1)

13-17: Nice compile-time interface conformance check.

This is a good guard to prevent regressions in WebSocket-capable provider behavior.

providers/openai_compatible/multiapi.go (1)

82-106: Streaming extractor dispatch update looks good.

The dedicated Responses SSE path is now explicitly routed by api_type, with safe fallback behavior.

autorouter_test.go (1)

739-822: Great regression coverage for Responses streaming stream-options behavior.

Both path-based and body-based detection cases are covered and validate the intended non-injection behavior.

providers/openai_compatible/provider.go (1)

74-87: LGTM!

The WebSocketURL method implementation is clean and follows good Go patterns:

  • Proper nil check on resolver
  • Type assertion to optional interface for WebSocket capability
  • Clear error messages for both failure modes
providers/openai_compatible/extractor_test.go (1)

12-135: LGTM!

Comprehensive test coverage for reasoning token extraction:

  • Validates non-zero reasoning tokens are extracted and stored as int
  • Confirms zero reasoning tokens are omitted from metadata (avoiding noise)
  • Tests combined extraction of cache usage and reasoning tokens
providers/openai_compatible/streaming_extractor_test.go (1)

149-230: LGTM!

Well-structured streaming tests for reasoning token extraction that mirror the non-streaming tests. The SSE format is realistic and validates that reasoning tokens flow correctly through the streaming extraction path.

websocket_test.go (1)

1-100: LGTM!

Solid test coverage for WebSocket utilities:

  • ParseWSMessage tests cover various message types and error cases
  • ExtractWSUsage tests validate token extraction including cache and reasoning tokens
  • Malformed JSON error handling is properly tested
streaming_test.go (2)

217-260: LGTM!

Good extension of the OpenAI chunk usage extraction tests to cover reasoning tokens, including the combined cache + reasoning scenario.


611-808: LGTM!

Comprehensive test suite for Responses API SSE parsing:

  • Event type parsing (response.created, response.output_text.delta, response.completed)
  • Edge cases (empty input, [DONE] marker, malformed JSON)
  • Usage extraction with all token detail variants (cached, reasoning)
providers/openai_compatible/responses_test.go (2)

1049-1056: LGTM!

Good addition of explicit reasoning token verification in the Responses extractor test, ensuring consistency with other extraction paths.


1720-1825: LGTM!

Excellent test coverage for the streaming multi-API extractor dispatch logic:

  • Validates correct dispatch to Responses API extractor based on context
  • Validates dispatch to Chat Completions extractor
  • Verifies graceful fallback when request context is missing
autorouter.go (3)

250-270: LGTM!

Good fix for the Responses API streaming issue. Moving apiType detection before the stream_options modification and adding the apiType != APITypeResponses guard correctly prevents injecting stream_options.include_usage into Responses API requests, which would cause 400 errors.


427-434: LGTM!

Clean integration of WebSocket upgrade handling into ServeHTTP:

  • Checks all three conditions before routing to WebSocket handler
  • Properly guards error response with headerSent check (important since WebSocket upgrade may have already written headers)

512-516: LGTM!

The isWebSocketUpgrade helper correctly identifies WebSocket upgrade requests using case-insensitive header checks and strings.Contains to handle multi-value headers like Connection: keep-alive, upgrade.

DESIGN.md (5)

107-108: LGTM!

The WebSocket configuration options are well-documented and align with the AutoRouter struct fields shown in the context snippet (wsUpgrader, wsDialer, wsBillingCallback).


467-483: LGTM!

Clear documentation of the Responses API streaming format and the automatic stream_options skipping behavior. The SSE event examples accurately reflect the Responses API protocol.


542-576: LGTM!

The WebSocket adapter pattern is well-documented with clear interface definitions. The zero-dependency approach and gorilla/websocket compatibility notes are helpful for consumers.


580-646: LGTM!

The gorilla/websocket adapter example is practical and correctly demonstrates that *websocket.Conn satisfies WSConn directly while Upgrader and Dialer need thin wrappers.


697-715: LGTM!

Clear documentation of per-turn billing semantics and model prefix stripping behavior for WebSocket mode.

streaming.go (5)

91-97: LGTM!

The ReasoningTokens field addition to StreamingUsage is consistent with the PR objective to expose reasoning tokens across all extraction paths.


204-231: LGTM!

The Responses API streaming types correctly model the OpenAI Responses SSE event structure, including nested token details for cached and reasoning tokens.


247-263: LGTM!

ParseResponsesSSEEvent follows the same pattern as ParseOpenAISSEEvent — trimming whitespace, handling [DONE] with ErrStreamComplete, and unmarshaling JSON.


286-288: LGTM!

Reasoning tokens are correctly extracted from CompletionTokensDetails when present and greater than zero.


334-365: LGTM!

ExtractUsageFromResponsesEvent correctly extracts usage only from response.completed events, maps OpenAI's input_tokens/output_tokens naming to the canonical PromptTokens/CompletionTokens, and handles optional cache and reasoning token details.

autorouter_websocket_test.go (11)

18-78: LGTM!

The mockWSConn implementation correctly simulates bidirectional WebSocket communication with proper close coordination using atomic.Bool and channels. The closeFromPeer method ensures both ends close when either side disconnects.


80-111: LGTM!

The mockWSUpgrader and mockWSDialer correctly implement the WSUpgrader and WSDialer interfaces for test purposes, capturing dialed URLs and headers for verification.


126-158: LGTM!

The wsTestProvider helper creates a properly configured mock WebSocket-capable provider with sensible defaults for parsing, enriching, and URL resolution.


198-235: LGTM!

The mustReadFrame and mustReadError helpers include appropriate timeouts (2 seconds) to prevent test hangs while providing clear failure messages.


251-282: LGTM!

TestForwardWebSocket_BasicRelay provides good end-to-end coverage of the WebSocket relay flow, validating message forwarding in both directions and proper cleanup on close.


284-356: LGTM!

Tests for usage extraction, cache usage, and reasoning tokens correctly validate that the billing callback receives properly populated ResponseMetadata with token counts and custom fields.


358-403: LGTM!

TestForwardWebSocket_ModelPrefixStripping and TestForwardWebSocket_MultiTurn validate critical behaviors: provider prefix removal from model names and correct turn counting across multiple request/response cycles.


405-437: LGTM!

TestForwardWebSocket_BillingCallback correctly tests the integration with BillingCalculator, verifying that costs are computed and passed to the callback.


439-520: LGTM!

Good coverage of edge cases: client close, upstream close, error event passthrough, missing WebSocket configuration, and non-WebSocket-capable provider detection.


522-569: LGTM!

TestForwardWebSocket_PassthroughNonCreateMessages validates byte-for-byte passthrough of non-response.create messages, and TestServeHTTP_WebSocketDetection correctly tests the WebSocket upgrade detection path.


571-609: LGTM!

TestServeHTTP_NonWebSocketUnchanged validates that regular HTTP POST requests are unaffected when WebSocket mode is configured, ensuring no regression in normal request handling.

providers/openai_compatible/responses_streaming_extractor.go (4)

15-35: LGTM!

Clean composition pattern embedding ResponsesExtractor and proper dispatch between streaming and non-streaming paths based on content type.


37-79: LGTM!

The non-streaming fallback correctly uses TeeReader to extract metadata while simultaneously writing the response to the client. The 512KB buffer and chunked flush pattern match the codebase conventions mentioned in the DESIGN.md.


81-132: LGTM!

SSE parsing correctly sets streaming headers, uses appropriately sized scanner buffers (64KB initial, 1MB max), and handles the [DONE] marker and parse errors gracefully without breaking the stream.


134-179: Consider extracting reasoning_tokens once instead of twice.

Reasoning tokens are extracted both at line 149-151 (from the response object during event processing) and again at lines 174-176 (from accumulatedUsage). While this works correctly (the second assignment will overwrite), it's slightly redundant.

However, this redundancy ensures both paths are covered if the response structure varies, so this is acceptable as-is.

providers/openai_compatible/responses_streaming_extractor_test.go (5)

14-29: LGTM!

Clean test helper that encapsulates response setup, extraction, and result capture for reuse across test cases.


31-66: LGTM!

Comprehensive lifecycle test covering the full Responses API event sequence from response.created through response.completed, validating both passthrough accuracy and metadata extraction.


68-122: LGTM!

Good coverage of usage extraction variations including basic token counts, cache usage with input_tokens_details.cached_tokens, and reasoning tokens with output_tokens_details.reasoning_tokens.


124-204: LGTM!

Edge case tests are thorough: function call streaming, error event passthrough, no response.completed event, empty stream with only [DONE], and malformed event handling that continues forwarding.


206-278: LGTM!

Good test coverage for non-streaming fallback, IsStreamingResponse content type detection, event: prefix handling in SSE, and byte-accurate passthrough including comment lines (: ping).

autorouter_websocket.go (8)

17-41: LGTM!

The initial setup correctly validates WebSocket configuration, upgrades the connection, reads the first message, and validates it's a response.create message before proceeding.


43-70: LGTM!

Provider detection correctly reuses the existing detector and modelProviderLookup infrastructure, with proper fallback handling and WebSocket capability check via type assertion.


72-118: LGTM!

Model prefix stripping, metadata parsing, URL resolution, header cloning, and request enrichment are all handled correctly before dialing the upstream WebSocket.


120-131: LGTM!

The sync.Once-guarded closeBoth function ensures both connections are closed exactly once, preventing double-close errors and ensuring proper cleanup regardless of which relay goroutine exits first.


133-180: LGTM!

The model state is properly protected with sync.RWMutex for concurrent access from both relay goroutines. The error channel pattern correctly collects errors from both goroutines, filtering out expected close errors.


183-211: LGTM!

relayClientToUpstream correctly handles model prefix stripping for subsequent response.create messages and passes through all other messages byte-for-byte.


213-261: LGTM!

relayUpstreamToClient correctly extracts usage from response.completed events, increments the turn counter, populates ResponseMetadata with usage and custom fields (cache_usage, reasoning_tokens), calculates billing, and invokes the callback. Messages are forwarded regardless of usage extraction.


264-289: LGTM!

Helper functions are simple and correct: rewriteWSCreateModel preserves all fields while updating model, cloneHeader creates a deep copy, and isWSRelayCloseError correctly identifies expected close conditions.

websocket.go (5)

10-17: LGTM!

RFC 6455 message type constants are correctly defined with standard values. These match gorilla/websocket constants, ensuring seamless interoperability.


19-52: LGTM!

Clean interface definitions that enable the zero-dependency adapter pattern. The WebSocketCapableProvider extension of Provider is a proper interface composition pattern.


54-70: LGTM!

WSMessage captures the essential fields needed for routing decisions, and ParseWSMessage correctly preserves the raw JSON for later rewriting while extracting commonly accessed fields.


72-98: LGTM!

The response structure types correctly model the OpenAI Responses API WebSocket payload format, including the nested envelope pattern where usage can appear at either top level or under response.


100-135: LGTM!

ExtractWSUsage correctly handles both usage locations (top-level and nested under response), extracts cache and reasoning token details when present and non-zero, and returns nil, nil for non-response.completed events as documented.

Comment thread providers/openai_compatible/websocket_test.go
Comment thread providers/openai_compatible/websocket.go
Comment thread README.md Outdated
…n examples

- NewResolver now strips trailing /v1 from base URLs to prevent double
  /v1/v1/responses paths in both HTTP and WebSocket URL resolution
- README and DESIGN.md examples use origin whitelist instead of open CheckOrigin
- Fix bare fenced code block in DESIGN.md WebSocket flow diagram
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
DESIGN.md (1)

483-506: Clarify stream_options injection scope to avoid mixed guidance.

Line 483 says Responses API skips stream_options.include_usage, but Lines 496-506 still describe auto-injection in unconditional terms. Please scope that section to non-Responses streaming (e.g., Chat Completions/OpenAI-compatible).

Suggested doc fix
-When `BillingCalculator` is configured and the request has `stream: true`, the proxy automatically injects:
+When `BillingCalculator` is configured and the request has `stream: true`, the proxy may inject:
@@
-This ensures OpenAI returns token usage in the streaming response for billing calculation.
+This is applied for APIs that require it (e.g., Chat Completions/OpenAI-compatible streaming).
+For Responses API streaming, injection is skipped because usage is delivered in `response.completed`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN.md` around lines 483 - 506, Clarify the doc text to scope the
auto-injection behavior: state that the proxy auto-injects {"stream": true,
"stream_options": {"include_usage": true"}} only for non-Responses streaming
endpoints (e.g., OpenAI-compatible Chat Completions) when BillingCalculator is
configured and the request has stream: true, and explicitly note that Responses
API requests (which always include usage in response.completed and in Anthropic
message_start/message_delta events) are excluded from this injection; update
references to stream_options.include_usage, BillingCalculator, Responses API,
and the proxy auto-injection paragraph to reflect this scoped behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN.md`:
- Around line 474-475: The doc incorrectly references MultiAPIExtractor; update
the text to use the exact class name introduced in code,
StreamingMultiAPIExtractor, so readers can directly map the design note to the
implementation (replace "MultiAPIExtractor" with "StreamingMultiAPIExtractor" in
the sentence about dispatching based on request context and the
response.completed event).

---

Nitpick comments:
In `@DESIGN.md`:
- Around line 483-506: Clarify the doc text to scope the auto-injection
behavior: state that the proxy auto-injects {"stream": true, "stream_options":
{"include_usage": true"}} only for non-Responses streaming endpoints (e.g.,
OpenAI-compatible Chat Completions) when BillingCalculator is configured and the
request has stream: true, and explicitly note that Responses API requests (which
always include usage in response.completed and in Anthropic
message_start/message_delta events) are excluded from this injection; update
references to stream_options.include_usage, BillingCalculator, Responses API,
and the proxy auto-injection paragraph to reflect this scoped behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38d80730-e1b3-4ccd-a578-bb2a19ceb4da

📥 Commits

Reviewing files that changed from the base of the PR and between 01654a4 and e939cda.

📒 Files selected for processing (4)
  • DESIGN.md
  • README.md
  • providers/openai_compatible/resolver.go
  • providers/openai_compatible/websocket_test.go
✅ Files skipped from review due to trivial changes (2)
  • providers/openai_compatible/resolver.go
  • providers/openai_compatible/websocket_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md
📜 Review details
🔇 Additional comments (1)
DESIGN.md (1)

542-722: WebSocket design section is solid and implementation-aligned.

The adapter interfaces, relay flow, per-turn billing lifecycle, and close semantics are clear and consistent with the implementation snippets.

Comment thread DESIGN.md Outdated
…jection docs

- Correct MultiAPIExtractor → StreamingMultiAPIExtractor in usage extraction docs
- Clarify that stream_options injection only applies to Chat Completions, not
  Responses API, Anthropic, Bedrock, or Google AI (which include usage natively)
@jhaynie jhaynie merged commit 1bd7320 into main Apr 16, 2026
2 checks passed
@jhaynie jhaynie deleted the feat/responses-api-streaming-websocket branch April 16, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant