Conversation
…n support - Fix stream_options.include_usage injection for Responses API requests - Add dedicated ResponsesStreamingExtractor for SSE streaming usage extraction - Add WebSocket mode with zero-dependency adapter pattern (WSConn, WSUpgrader, WSDialer) - Implement bidirectional relay with per-turn billing and model prefix stripping - Add consistent reasoning_tokens extraction across all 5 extraction paths - Update DESIGN.md and README.md with WebSocket docs and gorilla example - Add 60+ new tests covering streaming, WebSocket, and reasoning tokens
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds a WebSocket mode for forwarding OpenAI Responses (client↔upstream) with per‑turn billing callbacks, SSE and WebSocket streaming extractors (including reasoning/cache token capture), WebSocket abstractions/helpers, provider WebSocket URL resolution, model rewriting, ServeHTTP WS detection, and extensive tests and docs updates. Changes
🚥 Pre-merge checks | ✅ 1✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
DESIGN.md (1)
662-695: Add language specifier to the flow diagram code block.The static analysis tool flagged this fenced code block as missing a language specifier.
📝 Proposed fix
-``` +```text +------------------+ +------------------+ +------------------+🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@DESIGN.md` around lines 662 - 695, The fenced diagram in the "WebSocket Flow" section is missing a language specifier causing static analysis to flag it; update the opening fence for the diagram (the triple backticks that begin the code block under "WebSocket Flow") to include a language like "text" (i.e., change ``` to ```text) so the block is explicitly typed; ensure any other adjacent fenced diagram blocks in that section are similarly updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@providers/openai_compatible/websocket_test.go`:
- Around line 51-63: The WebSocket URL builder currently duplicates "/v1" when a
BaseURL already ends with that suffix; update NewResolver (or the resolver
initialization) to normalize the provided base URL by removing any trailing
"/v1" or "/v1/" before storing it, so that subsequent calls to
r.WebSocketURL(llmproxy.BodyMetadata{}) append a single "/v1/responses"; locate
the normalization logic in NewResolver and ensure it trims a trailing "/v1"
(case-sensitive) and any extra slash, or add a small helper (e.g.,
normalizeBaseURL) used by NewResolver and referenced by WebSocketURL to prevent
double "/v1" segments.
In `@providers/openai_compatible/websocket.go`:
- Line 27: The ws URL builder currently always appends "v1/responses" via
u.JoinPath("v1", "responses"), causing a duplicate /v1 when BaseURL already ends
with /v1; modify the logic that builds the websocket path to check the parsed
URL's Path (e.g., u.Path or url.Path) and if it already has a trailing "/v1"
(use strings.HasSuffix(u.Path, "/v1") or normalize trailing slashes) then call
u.JoinPath("responses") (or join only "responses"), otherwise call
u.JoinPath("v1", "responses"); ensure you normalize slashes so neither double
nor missing slashes occur and keep the return signature the same.
In `@README.md`:
- Around line 231-233: The README example uses an unsafe CheckOrigin that
unconditionally returns true; update the gorillaUpgrader/websocket.Upgrader
CheckOrigin implementation to validate the request Origin header against a
whitelist of trusted origins (or use same-origin checks) before allowing the
upgrade. Replace the unconditional return true with logic in the CheckOrigin
callback that reads r.Header.Get("Origin") and compares it to a configured list
(or derives allowed origin from the request) and only returns true for matches;
mention using a configurable trustedOrigins list and the
gorillaUpgrader/websocket.Upgrader symbols so readers can copy a secure pattern
for production.
---
Nitpick comments:
In `@DESIGN.md`:
- Around line 662-695: The fenced diagram in the "WebSocket Flow" section is
missing a language specifier causing static analysis to flag it; update the
opening fence for the diagram (the triple backticks that begin the code block
under "WebSocket Flow") to include a language like "text" (i.e., change ``` to
```text) so the block is explicitly typed; ensure any other adjacent fenced
diagram blocks in that section are similarly updated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1b82b7fc-75fd-4fc9-a320-547e4cac3f59
📒 Files selected for processing (23)
DESIGN.mdREADME.mdautorouter.goautorouter_test.goautorouter_websocket.goautorouter_websocket_test.goproviders/openai/provider.goproviders/openai_compatible/extractor.goproviders/openai_compatible/extractor_test.goproviders/openai_compatible/multiapi.goproviders/openai_compatible/provider.goproviders/openai_compatible/responses_extractor.goproviders/openai_compatible/responses_streaming_extractor.goproviders/openai_compatible/responses_streaming_extractor_test.goproviders/openai_compatible/responses_test.goproviders/openai_compatible/streaming_extractor.goproviders/openai_compatible/streaming_extractor_test.goproviders/openai_compatible/websocket.goproviders/openai_compatible/websocket_test.gostreaming.gostreaming_test.gowebsocket.gowebsocket_test.go
📜 Review details
🧰 Additional context used
🪛 markdownlint-cli2 (0.22.0)
DESIGN.md
[warning] 664-664: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (60)
providers/openai_compatible/responses_extractor.go (1)
61-63: Good reasoning-token extraction guardrails.Nil-check + positive-value guard is consistent with existing metadata extraction and avoids noisy zero-value fields.
providers/openai_compatible/extractor.go (1)
55-57: Reasoning token propagation looks correct.This keeps non-streaming chat-completions extraction aligned with the other extractor paths.
providers/openai_compatible/streaming_extractor.go (1)
176-178: Streaming reasoning-token metadata wiring is solid.The field is emitted only when available, alongside other accumulated usage metadata.
providers/openai/provider.go (1)
13-17: Nice compile-time interface conformance check.This is a good guard to prevent regressions in WebSocket-capable provider behavior.
providers/openai_compatible/multiapi.go (1)
82-106: Streaming extractor dispatch update looks good.The dedicated Responses SSE path is now explicitly routed by
api_type, with safe fallback behavior.autorouter_test.go (1)
739-822: Great regression coverage for Responses streaming stream-options behavior.Both path-based and body-based detection cases are covered and validate the intended non-injection behavior.
providers/openai_compatible/provider.go (1)
74-87: LGTM!The
WebSocketURLmethod implementation is clean and follows good Go patterns:
- Proper nil check on resolver
- Type assertion to optional interface for WebSocket capability
- Clear error messages for both failure modes
providers/openai_compatible/extractor_test.go (1)
12-135: LGTM!Comprehensive test coverage for reasoning token extraction:
- Validates non-zero reasoning tokens are extracted and stored as
int- Confirms zero reasoning tokens are omitted from metadata (avoiding noise)
- Tests combined extraction of cache usage and reasoning tokens
providers/openai_compatible/streaming_extractor_test.go (1)
149-230: LGTM!Well-structured streaming tests for reasoning token extraction that mirror the non-streaming tests. The SSE format is realistic and validates that reasoning tokens flow correctly through the streaming extraction path.
websocket_test.go (1)
1-100: LGTM!Solid test coverage for WebSocket utilities:
ParseWSMessagetests cover various message types and error casesExtractWSUsagetests validate token extraction including cache and reasoning tokens- Malformed JSON error handling is properly tested
streaming_test.go (2)
217-260: LGTM!Good extension of the OpenAI chunk usage extraction tests to cover reasoning tokens, including the combined cache + reasoning scenario.
611-808: LGTM!Comprehensive test suite for Responses API SSE parsing:
- Event type parsing (
response.created,response.output_text.delta,response.completed)- Edge cases (empty input,
[DONE]marker, malformed JSON)- Usage extraction with all token detail variants (cached, reasoning)
providers/openai_compatible/responses_test.go (2)
1049-1056: LGTM!Good addition of explicit reasoning token verification in the Responses extractor test, ensuring consistency with other extraction paths.
1720-1825: LGTM!Excellent test coverage for the streaming multi-API extractor dispatch logic:
- Validates correct dispatch to Responses API extractor based on context
- Validates dispatch to Chat Completions extractor
- Verifies graceful fallback when request context is missing
autorouter.go (3)
250-270: LGTM!Good fix for the Responses API streaming issue. Moving
apiTypedetection before thestream_optionsmodification and adding theapiType != APITypeResponsesguard correctly prevents injectingstream_options.include_usageinto Responses API requests, which would cause 400 errors.
427-434: LGTM!Clean integration of WebSocket upgrade handling into
ServeHTTP:
- Checks all three conditions before routing to WebSocket handler
- Properly guards error response with
headerSentcheck (important since WebSocket upgrade may have already written headers)
512-516: LGTM!The
isWebSocketUpgradehelper correctly identifies WebSocket upgrade requests using case-insensitive header checks andstrings.Containsto handle multi-value headers likeConnection: keep-alive, upgrade.DESIGN.md (5)
107-108: LGTM!The WebSocket configuration options are well-documented and align with the
AutoRouterstruct fields shown in the context snippet (wsUpgrader,wsDialer,wsBillingCallback).
467-483: LGTM!Clear documentation of the Responses API streaming format and the automatic
stream_optionsskipping behavior. The SSE event examples accurately reflect the Responses API protocol.
542-576: LGTM!The WebSocket adapter pattern is well-documented with clear interface definitions. The zero-dependency approach and gorilla/websocket compatibility notes are helpful for consumers.
580-646: LGTM!The gorilla/websocket adapter example is practical and correctly demonstrates that
*websocket.ConnsatisfiesWSConndirectly whileUpgraderandDialerneed thin wrappers.
697-715: LGTM!Clear documentation of per-turn billing semantics and model prefix stripping behavior for WebSocket mode.
streaming.go (5)
91-97: LGTM!The
ReasoningTokensfield addition toStreamingUsageis consistent with the PR objective to expose reasoning tokens across all extraction paths.
204-231: LGTM!The Responses API streaming types correctly model the OpenAI Responses SSE event structure, including nested token details for cached and reasoning tokens.
247-263: LGTM!
ParseResponsesSSEEventfollows the same pattern asParseOpenAISSEEvent— trimming whitespace, handling[DONE]withErrStreamComplete, and unmarshaling JSON.
286-288: LGTM!Reasoning tokens are correctly extracted from
CompletionTokensDetailswhen present and greater than zero.
334-365: LGTM!
ExtractUsageFromResponsesEventcorrectly extracts usage only fromresponse.completedevents, maps OpenAI'sinput_tokens/output_tokensnaming to the canonicalPromptTokens/CompletionTokens, and handles optional cache and reasoning token details.autorouter_websocket_test.go (11)
18-78: LGTM!The
mockWSConnimplementation correctly simulates bidirectional WebSocket communication with proper close coordination usingatomic.Booland channels. ThecloseFromPeermethod ensures both ends close when either side disconnects.
80-111: LGTM!The
mockWSUpgraderandmockWSDialercorrectly implement theWSUpgraderandWSDialerinterfaces for test purposes, capturing dialed URLs and headers for verification.
126-158: LGTM!The
wsTestProviderhelper creates a properly configured mock WebSocket-capable provider with sensible defaults for parsing, enriching, and URL resolution.
198-235: LGTM!The
mustReadFrameandmustReadErrorhelpers include appropriate timeouts (2 seconds) to prevent test hangs while providing clear failure messages.
251-282: LGTM!
TestForwardWebSocket_BasicRelayprovides good end-to-end coverage of the WebSocket relay flow, validating message forwarding in both directions and proper cleanup on close.
284-356: LGTM!Tests for usage extraction, cache usage, and reasoning tokens correctly validate that the billing callback receives properly populated
ResponseMetadatawith token counts and custom fields.
358-403: LGTM!
TestForwardWebSocket_ModelPrefixStrippingandTestForwardWebSocket_MultiTurnvalidate critical behaviors: provider prefix removal from model names and correct turn counting across multiple request/response cycles.
405-437: LGTM!
TestForwardWebSocket_BillingCallbackcorrectly tests the integration withBillingCalculator, verifying that costs are computed and passed to the callback.
439-520: LGTM!Good coverage of edge cases: client close, upstream close, error event passthrough, missing WebSocket configuration, and non-WebSocket-capable provider detection.
522-569: LGTM!
TestForwardWebSocket_PassthroughNonCreateMessagesvalidates byte-for-byte passthrough of non-response.createmessages, andTestServeHTTP_WebSocketDetectioncorrectly tests the WebSocket upgrade detection path.
571-609: LGTM!
TestServeHTTP_NonWebSocketUnchangedvalidates that regular HTTP POST requests are unaffected when WebSocket mode is configured, ensuring no regression in normal request handling.providers/openai_compatible/responses_streaming_extractor.go (4)
15-35: LGTM!Clean composition pattern embedding
ResponsesExtractorand proper dispatch between streaming and non-streaming paths based on content type.
37-79: LGTM!The non-streaming fallback correctly uses
TeeReaderto extract metadata while simultaneously writing the response to the client. The 512KB buffer and chunked flush pattern match the codebase conventions mentioned in the DESIGN.md.
81-132: LGTM!SSE parsing correctly sets streaming headers, uses appropriately sized scanner buffers (64KB initial, 1MB max), and handles the
[DONE]marker and parse errors gracefully without breaking the stream.
134-179: Consider extracting reasoning_tokens once instead of twice.Reasoning tokens are extracted both at line 149-151 (from the response object during event processing) and again at lines 174-176 (from
accumulatedUsage). While this works correctly (the second assignment will overwrite), it's slightly redundant.However, this redundancy ensures both paths are covered if the response structure varies, so this is acceptable as-is.
providers/openai_compatible/responses_streaming_extractor_test.go (5)
14-29: LGTM!Clean test helper that encapsulates response setup, extraction, and result capture for reuse across test cases.
31-66: LGTM!Comprehensive lifecycle test covering the full Responses API event sequence from
response.createdthroughresponse.completed, validating both passthrough accuracy and metadata extraction.
68-122: LGTM!Good coverage of usage extraction variations including basic token counts, cache usage with
input_tokens_details.cached_tokens, and reasoning tokens withoutput_tokens_details.reasoning_tokens.
124-204: LGTM!Edge case tests are thorough: function call streaming, error event passthrough, no
response.completedevent, empty stream with only[DONE], and malformed event handling that continues forwarding.
206-278: LGTM!Good test coverage for non-streaming fallback,
IsStreamingResponsecontent type detection,event:prefix handling in SSE, and byte-accurate passthrough including comment lines (: ping).autorouter_websocket.go (8)
17-41: LGTM!The initial setup correctly validates WebSocket configuration, upgrades the connection, reads the first message, and validates it's a
response.createmessage before proceeding.
43-70: LGTM!Provider detection correctly reuses the existing
detectorandmodelProviderLookupinfrastructure, with proper fallback handling and WebSocket capability check via type assertion.
72-118: LGTM!Model prefix stripping, metadata parsing, URL resolution, header cloning, and request enrichment are all handled correctly before dialing the upstream WebSocket.
120-131: LGTM!The
sync.Once-guardedcloseBothfunction ensures both connections are closed exactly once, preventing double-close errors and ensuring proper cleanup regardless of which relay goroutine exits first.
133-180: LGTM!The model state is properly protected with
sync.RWMutexfor concurrent access from both relay goroutines. The error channel pattern correctly collects errors from both goroutines, filtering out expected close errors.
183-211: LGTM!
relayClientToUpstreamcorrectly handles model prefix stripping for subsequentresponse.createmessages and passes through all other messages byte-for-byte.
213-261: LGTM!
relayUpstreamToClientcorrectly extracts usage fromresponse.completedevents, increments the turn counter, populatesResponseMetadatawith usage and custom fields (cache_usage, reasoning_tokens), calculates billing, and invokes the callback. Messages are forwarded regardless of usage extraction.
264-289: LGTM!Helper functions are simple and correct:
rewriteWSCreateModelpreserves all fields while updating model,cloneHeadercreates a deep copy, andisWSRelayCloseErrorcorrectly identifies expected close conditions.websocket.go (5)
10-17: LGTM!RFC 6455 message type constants are correctly defined with standard values. These match gorilla/websocket constants, ensuring seamless interoperability.
19-52: LGTM!Clean interface definitions that enable the zero-dependency adapter pattern. The
WebSocketCapableProviderextension ofProvideris a proper interface composition pattern.
54-70: LGTM!
WSMessagecaptures the essential fields needed for routing decisions, andParseWSMessagecorrectly preserves the raw JSON for later rewriting while extracting commonly accessed fields.
72-98: LGTM!The response structure types correctly model the OpenAI Responses API WebSocket payload format, including the nested envelope pattern where usage can appear at either top level or under
response.
100-135: LGTM!
ExtractWSUsagecorrectly handles both usage locations (top-level and nested underresponse), extracts cache and reasoning token details when present and non-zero, and returnsnil, nilfor non-response.completedevents as documented.
…n examples - NewResolver now strips trailing /v1 from base URLs to prevent double /v1/v1/responses paths in both HTTP and WebSocket URL resolution - README and DESIGN.md examples use origin whitelist instead of open CheckOrigin - Fix bare fenced code block in DESIGN.md WebSocket flow diagram
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
DESIGN.md (1)
483-506: Clarifystream_optionsinjection scope to avoid mixed guidance.Line 483 says Responses API skips
stream_options.include_usage, but Lines 496-506 still describe auto-injection in unconditional terms. Please scope that section to non-Responses streaming (e.g., Chat Completions/OpenAI-compatible).Suggested doc fix
-When `BillingCalculator` is configured and the request has `stream: true`, the proxy automatically injects: +When `BillingCalculator` is configured and the request has `stream: true`, the proxy may inject: @@ -This ensures OpenAI returns token usage in the streaming response for billing calculation. +This is applied for APIs that require it (e.g., Chat Completions/OpenAI-compatible streaming). +For Responses API streaming, injection is skipped because usage is delivered in `response.completed`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@DESIGN.md` around lines 483 - 506, Clarify the doc text to scope the auto-injection behavior: state that the proxy auto-injects {"stream": true, "stream_options": {"include_usage": true"}} only for non-Responses streaming endpoints (e.g., OpenAI-compatible Chat Completions) when BillingCalculator is configured and the request has stream: true, and explicitly note that Responses API requests (which always include usage in response.completed and in Anthropic message_start/message_delta events) are excluded from this injection; update references to stream_options.include_usage, BillingCalculator, Responses API, and the proxy auto-injection paragraph to reflect this scoped behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN.md`:
- Around line 474-475: The doc incorrectly references MultiAPIExtractor; update
the text to use the exact class name introduced in code,
StreamingMultiAPIExtractor, so readers can directly map the design note to the
implementation (replace "MultiAPIExtractor" with "StreamingMultiAPIExtractor" in
the sentence about dispatching based on request context and the
response.completed event).
---
Nitpick comments:
In `@DESIGN.md`:
- Around line 483-506: Clarify the doc text to scope the auto-injection
behavior: state that the proxy auto-injects {"stream": true, "stream_options":
{"include_usage": true"}} only for non-Responses streaming endpoints (e.g.,
OpenAI-compatible Chat Completions) when BillingCalculator is configured and the
request has stream: true, and explicitly note that Responses API requests (which
always include usage in response.completed and in Anthropic
message_start/message_delta events) are excluded from this injection; update
references to stream_options.include_usage, BillingCalculator, Responses API,
and the proxy auto-injection paragraph to reflect this scoped behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 38d80730-e1b3-4ccd-a578-bb2a19ceb4da
📒 Files selected for processing (4)
DESIGN.mdREADME.mdproviders/openai_compatible/resolver.goproviders/openai_compatible/websocket_test.go
✅ Files skipped from review due to trivial changes (2)
- providers/openai_compatible/resolver.go
- providers/openai_compatible/websocket_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- README.md
📜 Review details
🔇 Additional comments (1)
DESIGN.md (1)
542-722: WebSocket design section is solid and implementation-aligned.The adapter interfaces, relay flow, per-turn billing lifecycle, and close semantics are clear and consistent with the implementation snippets.
…jection docs - Correct MultiAPIExtractor → StreamingMultiAPIExtractor in usage extraction docs - Clarify that stream_options injection only applies to Chat Completions, not Responses API, Anthropic, Bedrock, or Google AI (which include usage natively)
Summary
stream_options.include_usagebeing incorrectly injected for Responses API requests (was causing 400 errors)response.completedeventsChanges
Bug Fix:
stream_optionson Responses APIThe proxy was injecting
stream_options.include_usageinto Responses API streaming requests, which don't support that parameter. Fixed by detecting API type before injection and skipping for Responses API.Responses API Streaming Extractor
New
ResponsesStreamingExtractorthat understands the Responses API SSE event format (response.created,response.output_text.delta,response.completed, etc.). Extracts usage, model, cache tokens, and reasoning tokens fromresponse.completedevents. TheStreamingMultiAPIExtractornow dispatches to the correct extractor based onapi_typein the request context.WebSocket Mode (Adapter Pattern)
Implements persistent WebSocket connections for multi-turn Responses API workflows. Uses a zero-dependency adapter pattern — the library defines
WSConn,WSUpgrader, andWSDialerinterfaces that consumers implement with their preferred WS library (gorilla, nhooyr, etc.). gorilla's*websocket.ConnsatisfiesWSConndirectly.Features:
sync.Onceclose coordinationWSBillingCallbackresponse.createmessagesWithAutoRouterWebSocket(upgrader, dialer)Reasoning Token Consistency
Previously only 1 of 5 extraction paths stored
reasoning_tokensin metadata. Now all paths consistently expose it viameta.Custom["reasoning_tokens"]:Documentation
DESIGN.mdwith WebSocket mode section, flow diagram, and gorilla exampleREADME.mdwith WebSocket section and complete Go + Python examplesTest Coverage
stream_optionsfix: 2 subtests (path-based, body-based detection)All tests pass:
go test ./...✅Summary by CodeRabbit
New Features
Enhancements
Documentation
Tests