LCORE-1880: Refactor of 503 responses#1572
LCORE-1880: Refactor of 503 responses#1572asimurka wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
WalkthroughAdds and expands OpenAPI documentation for HTTP 503 "Service Unavailable" across many endpoints by importing Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/openapi.json`:
- Around line 147-163: Update the route response metadata for the GET handlers
for "/" and "/metrics" so the 503 response for "application/json" includes the
ServiceUnavailableResponse schema (reference ServiceUnavailableResponse) instead
of only an example, and if HTML 503 responses are intentional change the
"text/html" content to use a simple string schema with an HTML example; then
regenerate docs/openapi.json so the generated OpenAPI includes the schema for
application/json and the corrected text/html shape.
In `@src/app/endpoints/vector_stores.py`:
- Around line 795-800: The OpenAPI responses for the delete-vector-store
endpoint are incomplete; update the responses dict to include 401, 403, 404 and
500 entries to match the docstring and runtime behavior: add a 401 response
(authentication error) tied to get_auth_dependency(), a 403 response
(authorization error) for the `@authorize` decorator, a 404 response for the "File
not found" error raised in the endpoint, and a 500 response for configuration
errors triggered by check_configuration_loaded; create or reuse descriptive
response objects (or a module-level shared dict for file-delete responses) and
replace the current responses block so all possible status codes are documented
in the endpoint's OpenAPI spec.
- Around line 365-370: The OpenAPI responses for this delete endpoint are
incomplete; add 401, 403, 404 and 500 to the responses mapping and reuse a
shared module-level response dict (e.g., VECTOR_STORE_DELETE_RESPONSES or
DELETE_RESPONSES) like other endpoints. Include entries that correspond to
get_auth_dependency() (401), the `@authorize` decorator (403), the NotFound case
when the vector store is missing (404), and check_configuration_loaded (500),
keeping the existing 204 and 503 entries and using the same OpenAPI response
helper objects/examples used elsewhere in this file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d836dd3f-49ee-4380-b5af-1e26e3a11464
📒 Files selected for processing (24)
docs/openapi.jsonsrc/app/endpoints/authorized.pysrc/app/endpoints/config.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/feedback.pysrc/app/endpoints/health.pysrc/app/endpoints/info.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/metrics.pysrc/app/endpoints/models.pysrc/app/endpoints/prompts.pysrc/app/endpoints/providers.pysrc/app/endpoints/query.pysrc/app/endpoints/rags.pysrc/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/root.pysrc/app/endpoints/shields.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/tools.pysrc/app/endpoints/vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: unit_tests (3.12)
- GitHub Check: black
- GitHub Check: integration_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/app/endpoints/shields.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/models.pysrc/app/endpoints/feedback.pysrc/app/endpoints/tools.pysrc/app/endpoints/providers.pysrc/app/endpoints/config.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/query.pysrc/app/endpoints/authorized.pysrc/app/endpoints/info.pysrc/app/endpoints/health.pysrc/app/endpoints/rags.pysrc/app/endpoints/prompts.pysrc/app/endpoints/metrics.pysrc/app/endpoints/responses.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/root.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoints
Files:
src/app/endpoints/shields.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/models.pysrc/app/endpoints/feedback.pysrc/app/endpoints/tools.pysrc/app/endpoints/providers.pysrc/app/endpoints/config.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/query.pysrc/app/endpoints/authorized.pysrc/app/endpoints/info.pysrc/app/endpoints/health.pysrc/app/endpoints/rags.pysrc/app/endpoints/prompts.pysrc/app/endpoints/metrics.pysrc/app/endpoints/responses.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/root.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations_v2.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/app/endpoints/shields.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/models.pysrc/app/endpoints/feedback.pysrc/app/endpoints/tools.pysrc/app/endpoints/providers.pysrc/app/endpoints/config.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/query.pysrc/app/endpoints/authorized.pysrc/app/endpoints/info.pysrc/app/endpoints/health.pysrc/app/endpoints/rags.pysrc/app/endpoints/prompts.pysrc/app/endpoints/metrics.pysrc/app/endpoints/responses.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/root.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations_v2.py
src/**/config*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/config*.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic models
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Use type hints likeOptional[FilePath],PositiveInt,SecretStrin Pydantic models
Files:
src/app/endpoints/config.py
🧠 Learnings (5)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoints
Applied to files:
src/app/endpoints/shields.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/config.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/info.pysrc/app/endpoints/metrics.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/vector_stores.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/shields.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/models.pysrc/app/endpoints/feedback.pysrc/app/endpoints/tools.pysrc/app/endpoints/providers.pysrc/app/endpoints/config.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/query.pysrc/app/endpoints/authorized.pysrc/app/endpoints/info.pysrc/app/endpoints/health.pysrc/app/endpoints/rags.pysrc/app/endpoints/prompts.pysrc/app/endpoints/metrics.pysrc/app/endpoints/responses.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/root.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations_v2.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/app/endpoints/models.pysrc/app/endpoints/tools.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Applied to files:
src/app/endpoints/mcp_servers.py
🔇 Additional comments (25)
src/app/endpoints/query.py (1)
92-94: LGTM — 503 examples aligned with PR-wide convention.Both
"llama stack"and"kubernetes api"examples are appropriate here since/querycan fail due to either backend (Llama Stack viaAPIConnectionErrorat line 320, or auth/k8s path).src/app/endpoints/shields.py (1)
35-37: LGTM.503 examples consistent with other llama-stack-backed list endpoints.
src/app/endpoints/rlsapi_v1.py (1)
94-96: LGTM.Both examples are justified:
_get_default_model_id/_call_llmsurface 503 on Llama Stack APIConnectionError, and auth can fail via k8s API.src/app/endpoints/rags.py (1)
37-51: LGTM — consistent application to both/ragsand/rags/{rag_id}.src/app/endpoints/models.py (1)
69-71: LGTM.src/app/endpoints/mcp_auth.py (1)
29-35: LGTM — correctly scoped tokubernetes apionly.This endpoint does not talk to Llama Stack (no
AsyncLlamaStackClientHolderusage), so restricting the 503 example to"kubernetes api"aligns with the PR's stated goal of avoiding inconsistent examples on non-Llama-Stack endpoints.One minor observation: the handler body doesn't currently raise a 503 itself — the documented 503 here represents failures in the auth middleware (k8s token review). Worth confirming this is the intended semantic for the OpenAPI consumers.
src/app/endpoints/tools.py (1)
99-101: LGTM.Both 503 sources are realized in the handler (APIConnectionError at lines 149 and 199).
src/app/endpoints/streaming_query.py (1)
142-144: LGTM.503 examples match the error paths at lines 371-376 and 607-612.
src/app/endpoints/stream_interrupt.py (1)
12-18: LGTM — the 503 example is scoped correctly.This endpoint is authenticated but does not call Llama Stack, so documenting only the Kubernetes API 503 example keeps the OpenAPI response consistent with the PR goal.
Also applies to: 28-34
src/app/endpoints/config.py (1)
13-19: LGTM — 503 documentation matches the endpoint dependencies.Using only the Kubernetes API example avoids implying a Llama Stack dependency for
/config.Also applies to: 27-33
src/app/endpoints/responses.py (1)
113-134: LGTM — both 503 examples are appropriate here.
/responsescan fail through both auth infrastructure and Llama Stack connectivity, so documenting"kubernetes api"and"llama stack"is consistent.src/app/endpoints/authorized.py (1)
10-15: LGTM — Kubernetes-only 503 documentation fits this endpoint.
/authorizeddoes not call Llama Stack, so the narrowed example set is consistent.Also applies to: 21-26
src/app/endpoints/info.py (1)
28-35: LGTM — the 503 examples cover both failure sources.
/infodepends on authentication and calls Llama Stack for version data, so both examples are warranted.src/app/endpoints/feedback.py (1)
18-26: LGTM — feedback 503 examples are restricted correctly.The authenticated POST/PUT endpoints document Kubernetes API unavailability, while avoiding a Llama Stack example that these handlers do not need.
Also applies to: 37-54
src/app/endpoints/providers.py (1)
33-52: LGTM — provider endpoints document the right 503 variants.Both endpoints depend on auth infrastructure and Llama Stack provider APIs, so including both examples is consistent.
src/app/endpoints/metrics.py (1)
28-35: LGTM — 503 examples are consistent with the metrics endpoint surface.The endpoint is authenticated and performs model metrics setup, so documenting both Kubernetes API and Llama Stack unavailability is reasonable.
src/app/endpoints/health.py (1)
50-52: 503 examples are correctly scoped for readiness vs liveness.The documentation update looks consistent: readiness includes both backend and Kubernetes outage examples, while liveness keeps Kubernetes-only.
Also applies to: 59-59
src/app/endpoints/root.py (1)
16-16: OpenAPI 503 mapping is appropriate for the root endpoint.Adding the 503 response with Kubernetes-focused example here is consistent and avoids over-documenting llama-stack failures.
Also applies to: 787-787
src/app/endpoints/prompts.py (1)
42-44: 503 documentation is consistent across all prompt endpoints.Using both
llama stackandkubernetes apiexamples is correct for this endpoint group.Also applies to: 52-54, 63-65, 74-76, 84-86
src/app/endpoints/conversations_v2.py (1)
26-26: Good 503 coverage update for conversations v2.The new OpenAPI entries are consistent and correctly limited to the Kubernetes outage example.
Also applies to: 45-45, 56-56, 66-66, 78-78
src/app/endpoints/conversations_v1.py (1)
68-70: 503 example expansion is correct for conversations v1.Including both
llama stackandkubernetes apiexamples matches the failure modes of these handlers.Also applies to: 83-85, 95-97, 109-111
src/app/endpoints/mcp_servers.py (1)
41-43: 503 examples are well-scoped across MCP server endpoints.
register/deletecorrectly include llama-stack + Kubernetes, andlistcorrectly stays Kubernetes-only.Also applies to: 131-131, 182-184
docs/openapi.json (2)
871-889: LGTM: Kubernetes-only 503 responses are consistently documented.These entries include the
ServiceUnavailableResponseJSON schema and scope the example to Kubernetes API unavailability, matching the PR intent for endpoints that should not expose Llama Stack examples.Also applies to: 1045-1063, 6349-6368, 6586-6605, 6793-6812, 7005-7024, 8160-8179, 8408-8427, 8637-8656, 8885-8904, 9944-9963, 10087-10106
2242-2250: LGTM: Dual backend outage examples are documented where needed.The Llama Stack and Kubernetes API 503 examples are both represented with the shared
ServiceUnavailableResponseshape for endpoints that can depend on both services.Also applies to: 2433-2441, 2670-2678, 2902-2910, 3111-3119, 4411-4438, 5402-5429, 7202-7229
src/app/endpoints/vector_stores.py (1)
59-61: The example identifiers are valid and correctly implemented.The example identifiers
"llama stack"and"kubernetes api"are properly defined in theServiceUnavailableResponsemodel configuration and are correctly passed to theopenapi_response()method, which accepts an optionalexamplesparameter to filter labeled examples for OpenAPI documentation.
3720e95 to
5cc7ca2
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
src/app/endpoints/vector_stores.py (2)
793-798:⚠️ Potential issue | 🟠 MajorComplete the delete-vector-store-file OpenAPI responses.
This still documents only 204 and 503, but the handler can also return 401, 403, 404, and 500. Add the missing entries or use a shared response map for file deletion.
📝 Proposed direction
+vector_store_file_delete_responses: dict[int | str, dict[str, Any]] = { + 204: {"description": "File deleted from vector store"}, + 401: UnauthorizedResponse.openapi_response(examples=UNAUTHORIZED_OPENAPI_EXAMPLES), + 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), + 404: NotFoundResponse.openapi_response(examples=["file"]), + 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), + 503: ServiceUnavailableResponse.openapi_response( + examples=["llama stack", "kubernetes api"] + ), +} + `@router.delete`( "/vector-stores/{vector_store_id}/files/{file_id}", - responses={ - "204": {"description": "File deleted from vector store"}, - 503: ServiceUnavailableResponse.openapi_response( - examples=["llama stack", "kubernetes api"] - ), - }, + responses=vector_store_file_delete_responses, status_code=status.HTTP_204_NO_CONTENT, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 793 - 798, The OpenAPI responses for the delete-vector-store-file endpoint are incomplete: the current responses dict only lists 204 and 503 but the handler can also return 401, 403, 404 and 500; update the responses mapping used in the route (the responses parameter in the delete_vector_store_file endpoint decorator / responses variable) to include entries for 401 (UnauthorizedResponse), 403 (ForbiddenResponse), 404 (NotFoundResponse), and 500 (InternalServerErrorResponse) or replace the inline dict with a shared FILE_DELETE_RESPONSES map that contains all of these codes plus the existing 204 and 503 to ensure the OpenAPI doc matches the handler behavior.
363-368:⚠️ Potential issue | 🟠 MajorComplete the delete-vector-store OpenAPI responses.
This still documents only 204 and 503, but the handler can also return 401, 403, 404, and 500. Reuse a module-level response map so the delete endpoint matches its runtime errors and docstring.
📝 Proposed direction
+vector_store_delete_responses: dict[int | str, dict[str, Any]] = { + 204: {"description": "Vector store deleted"}, + 401: UnauthorizedResponse.openapi_response(examples=UNAUTHORIZED_OPENAPI_EXAMPLES), + 403: ForbiddenResponse.openapi_response(examples=["endpoint"]), + 404: NotFoundResponse.openapi_response(examples=["vector store"]), + 500: InternalServerErrorResponse.openapi_response(examples=["configuration"]), + 503: ServiceUnavailableResponse.openapi_response( + examples=["llama stack", "kubernetes api"] + ), +} + `@router.delete`( "/vector-stores/{vector_store_id}", - responses={ - "204": {"description": "Vector store deleted"}, - 503: ServiceUnavailableResponse.openapi_response( - examples=["llama stack", "kubernetes api"] - ), - }, + responses=vector_store_delete_responses, status_code=status.HTTP_204_NO_CONTENT, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/vector_stores.py` around lines 363 - 368, The delete_vector_store endpoint's responses currently list only 204 and a 503 built via ServiceUnavailableResponse.openapi_response; update its responses dict to reuse the module-level response map (the shared responses variable defined at top of this module) so the OpenAPI for delete_vector_store includes 401, 403, 404, and 500 in addition to 204 and 503, matching the handler's runtime errors and docstring; modify the responses mapping for the delete endpoint to merge or reference that module-level map rather than hardcoding only 204/503.docs/openapi.json (2)
147-163:⚠️ Potential issue | 🟠 MajorAttach the 503 schema to
application/json, nottext/html.Line 147 documents a JSON 503 payload but omits the schema, while Lines 159-163 attach the JSON
ServiceUnavailableResponsemodel totext/html. Please fix the source response metadata and regenerate this generated file; if HTML is intentional, use a string/HTML schema there instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/openapi.json` around lines 147 - 163, The OpenAPI response media types are misassigned: attach the ServiceUnavailableResponse schema to the "application/json" media type and remove or replace the schema under "text/html" (use a plain string/HTML schema if HTML is intentional); update the response object where "application/json" currently only has examples and "text/html" references "#/components/schemas/ServiceUnavailableResponse", adjust so "application/json" contains "$ref": "#/components/schemas/ServiceUnavailableResponse" (and an example) and "text/html" uses type: string or an appropriate HTML schema, then regenerate the docs/openapi.json so the generated file reflects this change.
147-163:⚠️ Potential issue | 🟠 MajorAttach the 503 schema to
application/json, nottext/html.Line 147 documents a JSON 503 payload but omits the schema, while Lines 159-163 attach the JSON
ServiceUnavailableResponsemodel totext/html. Please fix the source response metadata and regenerate this generated file; if HTML is intentional, use a string/HTML schema there instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/openapi.json` around lines 147 - 163, The OpenAPI response has the ServiceUnavailableResponse schema incorrectly attached to "text/html" instead of the documented JSON example: move the "$ref": "#/components/schemas/ServiceUnavailableResponse" entry from the "text/html" media type to the "application/json" media type (so the "application/json" media type contains both the example and the schema), and if "text/html" should remain, replace its schema with a simple string/HTML schema (e.g., type: string, format: html) or remove it; after making this change for the response that uses ServiceUnavailableResponse, regenerate the openapi.json file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/conversations_v1.py`:
- Around line 95-97: The ServiceUnavailableResponse examples list incorrectly
includes "llama stack" for the get_conversations_list_endpoint_handler; update
the ServiceUnavailableResponse.openapi_response call in conversations_v1.py to
remove the "llama stack" example so the examples only reflect services the
handler actually touches (e.g., keep "kubernetes api" or other local/db-relevant
examples), ensuring the examples list passed to
ServiceUnavailableResponse.openapi_response no longer contains "llama stack".
In `@src/app/endpoints/health.py`:
- Around line 50-52: The current OpenAPI mapping uses
ServiceUnavailableResponse.openapi_response with a "llama stack" example which
is misleading because Llama Stack readiness failures are represented as
ProviderHealthStatus entries inside a ReadinessResponse returned with HTTP 503;
update the OpenAPI responses so the ServiceUnavailableResponse.openapi_response
no longer lists "llama stack" (keep it Kubernetes-only) and add or modify a
ReadinessResponse (or a readiness-specific 503 schema) that includes
ProviderHealthStatus examples for Llama Stack failures; locate and change the
ServiceUnavailableResponse.openapi_response call and add/update a
ReadinessResponse (or readiness 503) example referencing ProviderHealthStatus to
represent Llama Stack failures correctly.
---
Duplicate comments:
In `@docs/openapi.json`:
- Around line 147-163: The OpenAPI response media types are misassigned: attach
the ServiceUnavailableResponse schema to the "application/json" media type and
remove or replace the schema under "text/html" (use a plain string/HTML schema
if HTML is intentional); update the response object where "application/json"
currently only has examples and "text/html" references
"#/components/schemas/ServiceUnavailableResponse", adjust so "application/json"
contains "$ref": "#/components/schemas/ServiceUnavailableResponse" (and an
example) and "text/html" uses type: string or an appropriate HTML schema, then
regenerate the docs/openapi.json so the generated file reflects this change.
- Around line 147-163: The OpenAPI response has the ServiceUnavailableResponse
schema incorrectly attached to "text/html" instead of the documented JSON
example: move the "$ref": "#/components/schemas/ServiceUnavailableResponse"
entry from the "text/html" media type to the "application/json" media type (so
the "application/json" media type contains both the example and the schema), and
if "text/html" should remain, replace its schema with a simple string/HTML
schema (e.g., type: string, format: html) or remove it; after making this change
for the response that uses ServiceUnavailableResponse, regenerate the
openapi.json file.
In `@src/app/endpoints/vector_stores.py`:
- Around line 793-798: The OpenAPI responses for the delete-vector-store-file
endpoint are incomplete: the current responses dict only lists 204 and 503 but
the handler can also return 401, 403, 404 and 500; update the responses mapping
used in the route (the responses parameter in the delete_vector_store_file
endpoint decorator / responses variable) to include entries for 401
(UnauthorizedResponse), 403 (ForbiddenResponse), 404 (NotFoundResponse), and 500
(InternalServerErrorResponse) or replace the inline dict with a shared
FILE_DELETE_RESPONSES map that contains all of these codes plus the existing 204
and 503 to ensure the OpenAPI doc matches the handler behavior.
- Around line 363-368: The delete_vector_store endpoint's responses currently
list only 204 and a 503 built via ServiceUnavailableResponse.openapi_response;
update its responses dict to reuse the module-level response map (the shared
responses variable defined at top of this module) so the OpenAPI for
delete_vector_store includes 401, 403, 404, and 500 in addition to 204 and 503,
matching the handler's runtime errors and docstring; modify the responses
mapping for the delete endpoint to merge or reference that module-level map
rather than hardcoding only 204/503.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6156f0ff-8907-471a-82a9-447aae224746
📒 Files selected for processing (24)
docs/openapi.jsonsrc/app/endpoints/authorized.pysrc/app/endpoints/config.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/feedback.pysrc/app/endpoints/health.pysrc/app/endpoints/info.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/mcp_servers.pysrc/app/endpoints/metrics.pysrc/app/endpoints/models.pysrc/app/endpoints/prompts.pysrc/app/endpoints/providers.pysrc/app/endpoints/query.pysrc/app/endpoints/rags.pysrc/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/root.pysrc/app/endpoints/shields.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/tools.pysrc/app/endpoints/vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/app/endpoints/shields.pysrc/app/endpoints/rags.pysrc/app/endpoints/feedback.pysrc/app/endpoints/models.pysrc/app/endpoints/metrics.pysrc/app/endpoints/config.pysrc/app/endpoints/query.pysrc/app/endpoints/info.pysrc/app/endpoints/tools.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/health.pysrc/app/endpoints/root.pysrc/app/endpoints/authorized.pysrc/app/endpoints/providers.pysrc/app/endpoints/responses.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/prompts.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/mcp_servers.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoints
Files:
src/app/endpoints/shields.pysrc/app/endpoints/rags.pysrc/app/endpoints/feedback.pysrc/app/endpoints/models.pysrc/app/endpoints/metrics.pysrc/app/endpoints/config.pysrc/app/endpoints/query.pysrc/app/endpoints/info.pysrc/app/endpoints/tools.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/health.pysrc/app/endpoints/root.pysrc/app/endpoints/authorized.pysrc/app/endpoints/providers.pysrc/app/endpoints/responses.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/prompts.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/mcp_servers.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/app/endpoints/shields.pysrc/app/endpoints/rags.pysrc/app/endpoints/feedback.pysrc/app/endpoints/models.pysrc/app/endpoints/metrics.pysrc/app/endpoints/config.pysrc/app/endpoints/query.pysrc/app/endpoints/info.pysrc/app/endpoints/tools.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/health.pysrc/app/endpoints/root.pysrc/app/endpoints/authorized.pysrc/app/endpoints/providers.pysrc/app/endpoints/responses.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/prompts.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/mcp_servers.py
src/**/config*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/config*.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic models
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Use type hints likeOptional[FilePath],PositiveInt,SecretStrin Pydantic models
Files:
src/app/endpoints/config.py
🧠 Learnings (4)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoints
Applied to files:
src/app/endpoints/shields.pysrc/app/endpoints/models.pysrc/app/endpoints/metrics.pysrc/app/endpoints/config.pysrc/app/endpoints/info.pysrc/app/endpoints/tools.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/prompts.pysrc/app/endpoints/mcp_servers.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/shields.pysrc/app/endpoints/rags.pysrc/app/endpoints/feedback.pysrc/app/endpoints/models.pysrc/app/endpoints/metrics.pysrc/app/endpoints/config.pysrc/app/endpoints/query.pysrc/app/endpoints/info.pysrc/app/endpoints/tools.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/stream_interrupt.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/mcp_auth.pysrc/app/endpoints/health.pysrc/app/endpoints/root.pysrc/app/endpoints/authorized.pysrc/app/endpoints/providers.pysrc/app/endpoints/responses.pysrc/app/endpoints/vector_stores.pysrc/app/endpoints/prompts.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/mcp_servers.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/app/endpoints/tools.py
🔇 Additional comments (23)
src/app/endpoints/shields.py (1)
35-37: LGTM — the 503 examples are scoped correctly.This endpoint can hit both the Llama Stack backend and authenticated platform dependencies, so documenting both examples is consistent with the route behavior.
src/app/endpoints/rags.py (1)
37-39: LGTM — both RAG routes now document the applicable 503 cases.The list and detail handlers both depend on authenticated access and Llama Stack vector-store calls, so the two examples are appropriate.
Also applies to: 48-50
src/app/endpoints/models.py (1)
69-71: LGTM — the documented 503 examples match this endpoint.
/modelscan fail due to Llama Stack connectivity and authenticated platform dependencies, so including both examples is consistent.src/app/endpoints/rlsapi_v1.py (1)
90-92: LGTM — the 503 documentation reflects the inference failure surfaces.The endpoint can encounter both Llama Stack service failures and authenticated platform dependency failures, so these examples are appropriate.
src/app/endpoints/query.py (1)
92-94: LGTM — the added 503 examples are consistent with/query.This route has both Llama Stack calls and authenticated platform dependencies, so documenting both cases fits the implementation.
src/app/endpoints/streaming_query.py (1)
142-144: LGTM — the streaming route’s 503 examples are appropriate.The endpoint can fail with a 503 before stream creation for backend/platform dependency issues, so both examples are valid.
src/app/endpoints/responses.py (1)
134-136: LGTM —/responsesdocuments the relevant 503 cases.Both Llama Stack failures and authenticated platform dependency failures are applicable to this endpoint.
src/app/endpoints/config.py (1)
18-18: LGTM — the 503 example is correctly limited to Kubernetes API.
/configis authenticated but does not call Llama Stack, so excluding the Llama Stack example avoids the inconsistency this PR targets.Also applies to: 32-32
src/app/endpoints/providers.py (1)
38-52: LGTM!Both
providers_list_responsesandprovider_get_responsescorrectly include"llama stack"and"kubernetes api"examples, consistent with the handlers' actual 503 source (Llama StackAPIConnectionErrorat lines 92-95 and 163-166) plus the auth middleware's Kubernetes API dependency.src/app/endpoints/feedback.py (1)
25-54: LGTM!Restricting the 503 example to
"kubernetes api"is appropriate since neitherfeedback_endpoint_handlernorupdate_feedback_statusreaches Llama Stack; the only 503 source here is the auth middleware (Kubernetes API).src/app/endpoints/root.py (1)
16-16: LGTM!Handler just returns static HTML, so a 503 can only originate from the auth middleware —
"kubernetes api"example is the right (and only) one to document.Also applies to: 787-787
src/app/endpoints/tools.py (1)
99-101: LGTM!Both examples are justified: the handler raises 503 on Llama Stack
APIConnectionError(lines 149-152, 199-204), and the auth middleware can surface Kubernetes API 503s.src/app/endpoints/authorized.py (1)
14-14: LGTM!Handler does not touch Llama Stack; restricting the 503 example to
"kubernetes api"accurately reflects the only realistic 503 source (auth backend).Also applies to: 25-25
src/app/endpoints/stream_interrupt.py (1)
16-16: LGTM!Handler only interacts with the local
StreamInterruptRegistry, so"kubernetes api"(auth middleware) is the correct sole 503 example.Also applies to: 33-33
src/app/endpoints/metrics.py (1)
32-32: LGTM!Metrics endpoint does not call Llama Stack; scoping the 503 example to
"kubernetes api"is consistent with the PR's stated intent of avoiding inconsistent examples for non-Llama-Stack endpoints.src/app/endpoints/mcp_auth.py (1)
20-20: LGTM!Handler reads from local
configurationonly — no Llama Stack call — so the"kubernetes api"-only example correctly represents the auth-middleware 503 path.Also applies to: 34-34
src/app/endpoints/info.py (1)
32-34: LGTM — the 503 examples match this endpoint’s failure paths.
/infocan hit Llama Stack viaclient.inspect.version()and is also authenticated, so documenting both examples is consistent.src/app/endpoints/prompts.py (1)
44-46: LGTM — prompt endpoints consistently document both 503 sources.Each prompt handler reaches Llama Stack and is authenticated, so the
"llama stack"and"kubernetes api"examples are appropriate.Also applies to: 54-56, 66-68, 78-80, 89-91
src/app/endpoints/health.py (1)
59-59: LGTM — liveness only documents the auth/Kubernetes 503 path.The liveness handler does not call Llama Stack, so keeping this to
"kubernetes api"matches the PR objective.src/app/endpoints/conversations_v1.py (1)
68-70: LGTM — these conversation endpoints access Llama Stack.The get/delete/update handlers all call Llama Stack APIs and are authenticated, so documenting both 503 examples is consistent.
Also applies to: 83-85, 109-111
src/app/endpoints/vector_stores.py (1)
58-60: LGTM — vector-store 503 examples match the Llama Stack-backed handlers.These response maps are used by handlers that call Llama Stack and require auth, so both examples are appropriate.
Also applies to: 69-71, 80-82, 91-93, 102-104
src/app/endpoints/mcp_servers.py (1)
41-43: LGTM — MCP 503 examples are scoped correctly.Register/delete include Llama Stack because they call toolgroup APIs; list stays Kubernetes-only because it only reads local configuration.
Also applies to: 131-131, 182-184
src/app/endpoints/conversations_v2.py (1)
26-78: LGTM!The
ServiceUnavailableResponseimport and the 503 entries added to all four response maps are consistent with the PR's goal: since these conversation endpoints interact with the conversation cache (and potentially the Kubernetes API via auth), restricting the 503 example to"kubernetes api"(and excluding"llama stack") correctly avoids the inconsistent example noted in the PR description.
5cc7ca2 to
3e2e038
Compare
Description
Refactors 503 status response examples. Previously we had only a single example under 503. Adding a Kubernetes API example brought inconsistency into OpenAPI documentation. PR adds explicit 503 response record to every endpoint that requires authentication and restricts examples for endpoints that do not access llama stack service.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit