Skip to content

LCORE-1880: Refactor of 503 responses#1572

Open
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:refactor_503_responses
Open

LCORE-1880: Refactor of 503 responses#1572
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:refactor_503_responses

Conversation

@asimurka
Copy link
Copy Markdown
Contributor

@asimurka asimurka commented Apr 22, 2026

Description

Refactors 503 status response examples. Previously we had only a single example under 503. Adding a Kubernetes API example brought inconsistency into OpenAPI documentation. PR adds explicit 503 response record to every endpoint that requires authentication and restricts examples for endpoints that do not access llama stack service.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Documentation
    • Enhanced OpenAPI docs to add 503 "Service Unavailable" responses across API endpoints, including concrete example payloads (e.g., "kubernetes api" and where applicable "llama stack") to clarify service-outage scenarios; one endpoint also includes a text/html example for 503.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Adds and expands OpenAPI documentation for HTTP 503 "Service Unavailable" across many endpoints by importing ServiceUnavailableResponse and adding 503 response mappings with example payloads (commonly "kubernetes api" and "llama stack"); docs/openapi.json updated accordingly.

Changes

Cohort / File(s) Summary
OpenAPI spec
docs/openapi.json
Added/expanded 503 responses across many operations; ServiceUnavailableResponse referenced with application/json (and one text/html) media types and example objects ("kubernetes api", "llama stack").
New 503 entries / imports
src/app/endpoints/authorized.py, src/app/endpoints/config.py, src/app/endpoints/mcp_auth.py, src/app/endpoints/root.py, src/app/endpoints/stream_interrupt.py
Imported ServiceUnavailableResponse and added explicit 503 response mappings using ServiceUnavailableResponse.openapi_response(examples=["kubernetes api"]).
Added 503 to conversation endpoints
src/app/endpoints/conversations_v1.py, src/app/endpoints/conversations_v2.py
Inserted/updated 503 entries in conversation response maps; examples set to ["llama stack", "kubernetes api"] (v1) or ["kubernetes api"] (v2).
Health, info, metrics, and status endpoints
src/app/endpoints/health.py, src/app/endpoints/info.py, src/app/endpoints/metrics.py, src/app/endpoints/rlsapi_v1.py
Updated 503 response metadata to include example arrays (various combinations of "llama stack" and "kubernetes api").
Feedback, prompts, providers, models, query, responses, shields, tools, streaming, rags
src/app/endpoints/feedback.py, src/app/endpoints/prompts.py, src/app/endpoints/providers.py, src/app/endpoints/models.py, src/app/endpoints/query.py, src/app/endpoints/responses.py, src/app/endpoints/shields.py, src/app/endpoints/tools.py, src/app/endpoints/streaming_query.py, src/app/endpoints/rags.py
Replaced zero-argument ServiceUnavailableResponse.openapi_response() calls with versions including examples (mostly ["llama stack","kubernetes api"]). No handler logic changed.
MCP servers & vector stores
src/app/endpoints/mcp_servers.py, src/app/endpoints/vector_stores.py
Expanded 503 examples for register/list/delete and vector-store endpoints; added 503 entries to DELETE vector-store routes alongside existing success responses.
Miscellaneous small edits
src/app/endpoints/metrics.py, .../models.py, .../shields.py (see above)
Minor OpenAPI metadata adjustments to add example payloads for 503 responses; no runtime behavior changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: refactoring 503 service unavailable responses across the codebase by adding explicit examples to OpenAPI documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/openapi.json`:
- Around line 147-163: Update the route response metadata for the GET handlers
for "/" and "/metrics" so the 503 response for "application/json" includes the
ServiceUnavailableResponse schema (reference ServiceUnavailableResponse) instead
of only an example, and if HTML 503 responses are intentional change the
"text/html" content to use a simple string schema with an HTML example; then
regenerate docs/openapi.json so the generated OpenAPI includes the schema for
application/json and the corrected text/html shape.

In `@src/app/endpoints/vector_stores.py`:
- Around line 795-800: The OpenAPI responses for the delete-vector-store
endpoint are incomplete; update the responses dict to include 401, 403, 404 and
500 entries to match the docstring and runtime behavior: add a 401 response
(authentication error) tied to get_auth_dependency(), a 403 response
(authorization error) for the `@authorize` decorator, a 404 response for the "File
not found" error raised in the endpoint, and a 500 response for configuration
errors triggered by check_configuration_loaded; create or reuse descriptive
response objects (or a module-level shared dict for file-delete responses) and
replace the current responses block so all possible status codes are documented
in the endpoint's OpenAPI spec.
- Around line 365-370: The OpenAPI responses for this delete endpoint are
incomplete; add 401, 403, 404 and 500 to the responses mapping and reuse a
shared module-level response dict (e.g., VECTOR_STORE_DELETE_RESPONSES or
DELETE_RESPONSES) like other endpoints. Include entries that correspond to
get_auth_dependency() (401), the `@authorize` decorator (403), the NotFound case
when the vector store is missing (404), and check_configuration_loaded (500),
keeping the existing 204 and 503 entries and using the same OpenAPI response
helper objects/examples used elsewhere in this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d836dd3f-49ee-4380-b5af-1e26e3a11464

📥 Commits

Reviewing files that changed from the base of the PR and between fc47e17 and 3720e95.

📒 Files selected for processing (24)
  • docs/openapi.json
  • src/app/endpoints/authorized.py
  • src/app/endpoints/config.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/health.py
  • src/app/endpoints/info.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/models.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/query.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/root.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: black
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/models.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/config.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/query.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/info.py
  • src/app/endpoints/health.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/root.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/models.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/config.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/query.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/info.py
  • src/app/endpoints/health.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/root.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations_v2.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/models.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/config.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/query.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/info.py
  • src/app/endpoints/health.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/root.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations_v2.py
src/**/config*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/config*.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic models
Use @field_validator and @model_validator for custom validation in Pydantic models
Use type hints like Optional[FilePath], PositiveInt, SecretStr in Pydantic models

Files:

  • src/app/endpoints/config.py
🧠 Learnings (5)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoints

Applied to files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/config.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/info.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/models.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/config.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/query.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/info.py
  • src/app/endpoints/health.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/root.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations_v2.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

  • src/app/endpoints/mcp_servers.py
🔇 Additional comments (25)
src/app/endpoints/query.py (1)

92-94: LGTM — 503 examples aligned with PR-wide convention.

Both "llama stack" and "kubernetes api" examples are appropriate here since /query can fail due to either backend (Llama Stack via APIConnectionError at line 320, or auth/k8s path).

src/app/endpoints/shields.py (1)

35-37: LGTM.

503 examples consistent with other llama-stack-backed list endpoints.

src/app/endpoints/rlsapi_v1.py (1)

94-96: LGTM.

Both examples are justified: _get_default_model_id / _call_llm surface 503 on Llama Stack APIConnectionError, and auth can fail via k8s API.

src/app/endpoints/rags.py (1)

37-51: LGTM — consistent application to both /rags and /rags/{rag_id}.

src/app/endpoints/models.py (1)

69-71: LGTM.

src/app/endpoints/mcp_auth.py (1)

29-35: LGTM — correctly scoped to kubernetes api only.

This endpoint does not talk to Llama Stack (no AsyncLlamaStackClientHolder usage), so restricting the 503 example to "kubernetes api" aligns with the PR's stated goal of avoiding inconsistent examples on non-Llama-Stack endpoints.

One minor observation: the handler body doesn't currently raise a 503 itself — the documented 503 here represents failures in the auth middleware (k8s token review). Worth confirming this is the intended semantic for the OpenAPI consumers.

src/app/endpoints/tools.py (1)

99-101: LGTM.

Both 503 sources are realized in the handler (APIConnectionError at lines 149 and 199).

src/app/endpoints/streaming_query.py (1)

142-144: LGTM.

503 examples match the error paths at lines 371-376 and 607-612.

src/app/endpoints/stream_interrupt.py (1)

12-18: LGTM — the 503 example is scoped correctly.

This endpoint is authenticated but does not call Llama Stack, so documenting only the Kubernetes API 503 example keeps the OpenAPI response consistent with the PR goal.

Also applies to: 28-34

src/app/endpoints/config.py (1)

13-19: LGTM — 503 documentation matches the endpoint dependencies.

Using only the Kubernetes API example avoids implying a Llama Stack dependency for /config.

Also applies to: 27-33

src/app/endpoints/responses.py (1)

113-134: LGTM — both 503 examples are appropriate here.

/responses can fail through both auth infrastructure and Llama Stack connectivity, so documenting "kubernetes api" and "llama stack" is consistent.

src/app/endpoints/authorized.py (1)

10-15: LGTM — Kubernetes-only 503 documentation fits this endpoint.

/authorized does not call Llama Stack, so the narrowed example set is consistent.

Also applies to: 21-26

src/app/endpoints/info.py (1)

28-35: LGTM — the 503 examples cover both failure sources.

/info depends on authentication and calls Llama Stack for version data, so both examples are warranted.

src/app/endpoints/feedback.py (1)

18-26: LGTM — feedback 503 examples are restricted correctly.

The authenticated POST/PUT endpoints document Kubernetes API unavailability, while avoiding a Llama Stack example that these handlers do not need.

Also applies to: 37-54

src/app/endpoints/providers.py (1)

33-52: LGTM — provider endpoints document the right 503 variants.

Both endpoints depend on auth infrastructure and Llama Stack provider APIs, so including both examples is consistent.

src/app/endpoints/metrics.py (1)

28-35: LGTM — 503 examples are consistent with the metrics endpoint surface.

The endpoint is authenticated and performs model metrics setup, so documenting both Kubernetes API and Llama Stack unavailability is reasonable.

src/app/endpoints/health.py (1)

50-52: 503 examples are correctly scoped for readiness vs liveness.

The documentation update looks consistent: readiness includes both backend and Kubernetes outage examples, while liveness keeps Kubernetes-only.

Also applies to: 59-59

src/app/endpoints/root.py (1)

16-16: OpenAPI 503 mapping is appropriate for the root endpoint.

Adding the 503 response with Kubernetes-focused example here is consistent and avoids over-documenting llama-stack failures.

Also applies to: 787-787

src/app/endpoints/prompts.py (1)

42-44: 503 documentation is consistent across all prompt endpoints.

Using both llama stack and kubernetes api examples is correct for this endpoint group.

Also applies to: 52-54, 63-65, 74-76, 84-86

src/app/endpoints/conversations_v2.py (1)

26-26: Good 503 coverage update for conversations v2.

The new OpenAPI entries are consistent and correctly limited to the Kubernetes outage example.

Also applies to: 45-45, 56-56, 66-66, 78-78

src/app/endpoints/conversations_v1.py (1)

68-70: 503 example expansion is correct for conversations v1.

Including both llama stack and kubernetes api examples matches the failure modes of these handlers.

Also applies to: 83-85, 95-97, 109-111

src/app/endpoints/mcp_servers.py (1)

41-43: 503 examples are well-scoped across MCP server endpoints.

register/delete correctly include llama-stack + Kubernetes, and list correctly stays Kubernetes-only.

Also applies to: 131-131, 182-184

docs/openapi.json (2)

871-889: LGTM: Kubernetes-only 503 responses are consistently documented.

These entries include the ServiceUnavailableResponse JSON schema and scope the example to Kubernetes API unavailability, matching the PR intent for endpoints that should not expose Llama Stack examples.

Also applies to: 1045-1063, 6349-6368, 6586-6605, 6793-6812, 7005-7024, 8160-8179, 8408-8427, 8637-8656, 8885-8904, 9944-9963, 10087-10106


2242-2250: LGTM: Dual backend outage examples are documented where needed.

The Llama Stack and Kubernetes API 503 examples are both represented with the shared ServiceUnavailableResponse shape for endpoints that can depend on both services.

Also applies to: 2433-2441, 2670-2678, 2902-2910, 3111-3119, 4411-4438, 5402-5429, 7202-7229

src/app/endpoints/vector_stores.py (1)

59-61: The example identifiers are valid and correctly implemented.

The example identifiers "llama stack" and "kubernetes api" are properly defined in the ServiceUnavailableResponse model configuration and are correctly passed to the openapi_response() method, which accepts an optional examples parameter to filter labeled examples for OpenAPI documentation.

Comment thread docs/openapi.json
Comment thread src/app/endpoints/vector_stores.py
Comment thread src/app/endpoints/vector_stores.py
@asimurka asimurka requested a review from tisnik April 22, 2026 12:52
@asimurka asimurka force-pushed the refactor_503_responses branch from 3720e95 to 5cc7ca2 Compare April 22, 2026 13:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
src/app/endpoints/vector_stores.py (2)

793-798: ⚠️ Potential issue | 🟠 Major

Complete the delete-vector-store-file OpenAPI responses.

This still documents only 204 and 503, but the handler can also return 401, 403, 404, and 500. Add the missing entries or use a shared response map for file deletion.

📝 Proposed direction
+vector_store_file_delete_responses: dict[int | str, dict[str, Any]] = {
+    204: {"description": "File deleted from vector store"},
+    401: UnauthorizedResponse.openapi_response(examples=UNAUTHORIZED_OPENAPI_EXAMPLES),
+    403: ForbiddenResponse.openapi_response(examples=["endpoint"]),
+    404: NotFoundResponse.openapi_response(examples=["file"]),
+    500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),
+    503: ServiceUnavailableResponse.openapi_response(
+        examples=["llama stack", "kubernetes api"]
+    ),
+}
+
 `@router.delete`(
     "/vector-stores/{vector_store_id}/files/{file_id}",
-    responses={
-        "204": {"description": "File deleted from vector store"},
-        503: ServiceUnavailableResponse.openapi_response(
-            examples=["llama stack", "kubernetes api"]
-        ),
-    },
+    responses=vector_store_file_delete_responses,
     status_code=status.HTTP_204_NO_CONTENT,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 793 - 798, The OpenAPI
responses for the delete-vector-store-file endpoint are incomplete: the current
responses dict only lists 204 and 503 but the handler can also return 401, 403,
404 and 500; update the responses mapping used in the route (the responses
parameter in the delete_vector_store_file endpoint decorator / responses
variable) to include entries for 401 (UnauthorizedResponse), 403
(ForbiddenResponse), 404 (NotFoundResponse), and 500
(InternalServerErrorResponse) or replace the inline dict with a shared
FILE_DELETE_RESPONSES map that contains all of these codes plus the existing 204
and 503 to ensure the OpenAPI doc matches the handler behavior.

363-368: ⚠️ Potential issue | 🟠 Major

Complete the delete-vector-store OpenAPI responses.

This still documents only 204 and 503, but the handler can also return 401, 403, 404, and 500. Reuse a module-level response map so the delete endpoint matches its runtime errors and docstring.

📝 Proposed direction
+vector_store_delete_responses: dict[int | str, dict[str, Any]] = {
+    204: {"description": "Vector store deleted"},
+    401: UnauthorizedResponse.openapi_response(examples=UNAUTHORIZED_OPENAPI_EXAMPLES),
+    403: ForbiddenResponse.openapi_response(examples=["endpoint"]),
+    404: NotFoundResponse.openapi_response(examples=["vector store"]),
+    500: InternalServerErrorResponse.openapi_response(examples=["configuration"]),
+    503: ServiceUnavailableResponse.openapi_response(
+        examples=["llama stack", "kubernetes api"]
+    ),
+}
+
 `@router.delete`(
     "/vector-stores/{vector_store_id}",
-    responses={
-        "204": {"description": "Vector store deleted"},
-        503: ServiceUnavailableResponse.openapi_response(
-            examples=["llama stack", "kubernetes api"]
-        ),
-    },
+    responses=vector_store_delete_responses,
     status_code=status.HTTP_204_NO_CONTENT,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 363 - 368, The
delete_vector_store endpoint's responses currently list only 204 and a 503 built
via ServiceUnavailableResponse.openapi_response; update its responses dict to
reuse the module-level response map (the shared responses variable defined at
top of this module) so the OpenAPI for delete_vector_store includes 401, 403,
404, and 500 in addition to 204 and 503, matching the handler's runtime errors
and docstring; modify the responses mapping for the delete endpoint to merge or
reference that module-level map rather than hardcoding only 204/503.
docs/openapi.json (2)

147-163: ⚠️ Potential issue | 🟠 Major

Attach the 503 schema to application/json, not text/html.

Line 147 documents a JSON 503 payload but omits the schema, while Lines 159-163 attach the JSON ServiceUnavailableResponse model to text/html. Please fix the source response metadata and regenerate this generated file; if HTML is intentional, use a string/HTML schema there instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/openapi.json` around lines 147 - 163, The OpenAPI response media types
are misassigned: attach the ServiceUnavailableResponse schema to the
"application/json" media type and remove or replace the schema under "text/html"
(use a plain string/HTML schema if HTML is intentional); update the response
object where "application/json" currently only has examples and "text/html"
references "#/components/schemas/ServiceUnavailableResponse", adjust so
"application/json" contains "$ref":
"#/components/schemas/ServiceUnavailableResponse" (and an example) and
"text/html" uses type: string or an appropriate HTML schema, then regenerate the
docs/openapi.json so the generated file reflects this change.

147-163: ⚠️ Potential issue | 🟠 Major

Attach the 503 schema to application/json, not text/html.

Line 147 documents a JSON 503 payload but omits the schema, while Lines 159-163 attach the JSON ServiceUnavailableResponse model to text/html. Please fix the source response metadata and regenerate this generated file; if HTML is intentional, use a string/HTML schema there instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/openapi.json` around lines 147 - 163, The OpenAPI response has the
ServiceUnavailableResponse schema incorrectly attached to "text/html" instead of
the documented JSON example: move the "$ref":
"#/components/schemas/ServiceUnavailableResponse" entry from the "text/html"
media type to the "application/json" media type (so the "application/json" media
type contains both the example and the schema), and if "text/html" should
remain, replace its schema with a simple string/HTML schema (e.g., type: string,
format: html) or remove it; after making this change for the response that uses
ServiceUnavailableResponse, regenerate the openapi.json file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/conversations_v1.py`:
- Around line 95-97: The ServiceUnavailableResponse examples list incorrectly
includes "llama stack" for the get_conversations_list_endpoint_handler; update
the ServiceUnavailableResponse.openapi_response call in conversations_v1.py to
remove the "llama stack" example so the examples only reflect services the
handler actually touches (e.g., keep "kubernetes api" or other local/db-relevant
examples), ensuring the examples list passed to
ServiceUnavailableResponse.openapi_response no longer contains "llama stack".

In `@src/app/endpoints/health.py`:
- Around line 50-52: The current OpenAPI mapping uses
ServiceUnavailableResponse.openapi_response with a "llama stack" example which
is misleading because Llama Stack readiness failures are represented as
ProviderHealthStatus entries inside a ReadinessResponse returned with HTTP 503;
update the OpenAPI responses so the ServiceUnavailableResponse.openapi_response
no longer lists "llama stack" (keep it Kubernetes-only) and add or modify a
ReadinessResponse (or a readiness-specific 503 schema) that includes
ProviderHealthStatus examples for Llama Stack failures; locate and change the
ServiceUnavailableResponse.openapi_response call and add/update a
ReadinessResponse (or readiness 503) example referencing ProviderHealthStatus to
represent Llama Stack failures correctly.

---

Duplicate comments:
In `@docs/openapi.json`:
- Around line 147-163: The OpenAPI response media types are misassigned: attach
the ServiceUnavailableResponse schema to the "application/json" media type and
remove or replace the schema under "text/html" (use a plain string/HTML schema
if HTML is intentional); update the response object where "application/json"
currently only has examples and "text/html" references
"#/components/schemas/ServiceUnavailableResponse", adjust so "application/json"
contains "$ref": "#/components/schemas/ServiceUnavailableResponse" (and an
example) and "text/html" uses type: string or an appropriate HTML schema, then
regenerate the docs/openapi.json so the generated file reflects this change.
- Around line 147-163: The OpenAPI response has the ServiceUnavailableResponse
schema incorrectly attached to "text/html" instead of the documented JSON
example: move the "$ref": "#/components/schemas/ServiceUnavailableResponse"
entry from the "text/html" media type to the "application/json" media type (so
the "application/json" media type contains both the example and the schema), and
if "text/html" should remain, replace its schema with a simple string/HTML
schema (e.g., type: string, format: html) or remove it; after making this change
for the response that uses ServiceUnavailableResponse, regenerate the
openapi.json file.

In `@src/app/endpoints/vector_stores.py`:
- Around line 793-798: The OpenAPI responses for the delete-vector-store-file
endpoint are incomplete: the current responses dict only lists 204 and 503 but
the handler can also return 401, 403, 404 and 500; update the responses mapping
used in the route (the responses parameter in the delete_vector_store_file
endpoint decorator / responses variable) to include entries for 401
(UnauthorizedResponse), 403 (ForbiddenResponse), 404 (NotFoundResponse), and 500
(InternalServerErrorResponse) or replace the inline dict with a shared
FILE_DELETE_RESPONSES map that contains all of these codes plus the existing 204
and 503 to ensure the OpenAPI doc matches the handler behavior.
- Around line 363-368: The delete_vector_store endpoint's responses currently
list only 204 and a 503 built via ServiceUnavailableResponse.openapi_response;
update its responses dict to reuse the module-level response map (the shared
responses variable defined at top of this module) so the OpenAPI for
delete_vector_store includes 401, 403, 404, and 500 in addition to 204 and 503,
matching the handler's runtime errors and docstring; modify the responses
mapping for the delete endpoint to merge or reference that module-level map
rather than hardcoding only 204/503.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6156f0ff-8907-471a-82a9-447aae224746

📥 Commits

Reviewing files that changed from the base of the PR and between 3720e95 and 5cc7ca2.

📒 Files selected for processing (24)
  • docs/openapi.json
  • src/app/endpoints/authorized.py
  • src/app/endpoints/config.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/health.py
  • src/app/endpoints/info.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/mcp_servers.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/models.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/query.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/root.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/models.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/config.py
  • src/app/endpoints/query.py
  • src/app/endpoints/info.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/health.py
  • src/app/endpoints/root.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/mcp_servers.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/models.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/config.py
  • src/app/endpoints/query.py
  • src/app/endpoints/info.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/health.py
  • src/app/endpoints/root.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/mcp_servers.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/models.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/config.py
  • src/app/endpoints/query.py
  • src/app/endpoints/info.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/health.py
  • src/app/endpoints/root.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/mcp_servers.py
src/**/config*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/config*.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic models
Use @field_validator and @model_validator for custom validation in Pydantic models
Use type hints like Optional[FilePath], PositiveInt, SecretStr in Pydantic models

Files:

  • src/app/endpoints/config.py
🧠 Learnings (4)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoints

Applied to files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/models.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/config.py
  • src/app/endpoints/info.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/mcp_servers.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/shields.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/models.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/config.py
  • src/app/endpoints/query.py
  • src/app/endpoints/info.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/stream_interrupt.py
  • src/app/endpoints/conversations_v1.py
  • src/app/endpoints/mcp_auth.py
  • src/app/endpoints/health.py
  • src/app/endpoints/root.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/vector_stores.py
  • src/app/endpoints/prompts.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/mcp_servers.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/app/endpoints/tools.py
🔇 Additional comments (23)
src/app/endpoints/shields.py (1)

35-37: LGTM — the 503 examples are scoped correctly.

This endpoint can hit both the Llama Stack backend and authenticated platform dependencies, so documenting both examples is consistent with the route behavior.

src/app/endpoints/rags.py (1)

37-39: LGTM — both RAG routes now document the applicable 503 cases.

The list and detail handlers both depend on authenticated access and Llama Stack vector-store calls, so the two examples are appropriate.

Also applies to: 48-50

src/app/endpoints/models.py (1)

69-71: LGTM — the documented 503 examples match this endpoint.

/models can fail due to Llama Stack connectivity and authenticated platform dependencies, so including both examples is consistent.

src/app/endpoints/rlsapi_v1.py (1)

90-92: LGTM — the 503 documentation reflects the inference failure surfaces.

The endpoint can encounter both Llama Stack service failures and authenticated platform dependency failures, so these examples are appropriate.

src/app/endpoints/query.py (1)

92-94: LGTM — the added 503 examples are consistent with /query.

This route has both Llama Stack calls and authenticated platform dependencies, so documenting both cases fits the implementation.

src/app/endpoints/streaming_query.py (1)

142-144: LGTM — the streaming route’s 503 examples are appropriate.

The endpoint can fail with a 503 before stream creation for backend/platform dependency issues, so both examples are valid.

src/app/endpoints/responses.py (1)

134-136: LGTM — /responses documents the relevant 503 cases.

Both Llama Stack failures and authenticated platform dependency failures are applicable to this endpoint.

src/app/endpoints/config.py (1)

18-18: LGTM — the 503 example is correctly limited to Kubernetes API.

/config is authenticated but does not call Llama Stack, so excluding the Llama Stack example avoids the inconsistency this PR targets.

Also applies to: 32-32

src/app/endpoints/providers.py (1)

38-52: LGTM!

Both providers_list_responses and provider_get_responses correctly include "llama stack" and "kubernetes api" examples, consistent with the handlers' actual 503 source (Llama Stack APIConnectionError at lines 92-95 and 163-166) plus the auth middleware's Kubernetes API dependency.

src/app/endpoints/feedback.py (1)

25-54: LGTM!

Restricting the 503 example to "kubernetes api" is appropriate since neither feedback_endpoint_handler nor update_feedback_status reaches Llama Stack; the only 503 source here is the auth middleware (Kubernetes API).

src/app/endpoints/root.py (1)

16-16: LGTM!

Handler just returns static HTML, so a 503 can only originate from the auth middleware — "kubernetes api" example is the right (and only) one to document.

Also applies to: 787-787

src/app/endpoints/tools.py (1)

99-101: LGTM!

Both examples are justified: the handler raises 503 on Llama Stack APIConnectionError (lines 149-152, 199-204), and the auth middleware can surface Kubernetes API 503s.

src/app/endpoints/authorized.py (1)

14-14: LGTM!

Handler does not touch Llama Stack; restricting the 503 example to "kubernetes api" accurately reflects the only realistic 503 source (auth backend).

Also applies to: 25-25

src/app/endpoints/stream_interrupt.py (1)

16-16: LGTM!

Handler only interacts with the local StreamInterruptRegistry, so "kubernetes api" (auth middleware) is the correct sole 503 example.

Also applies to: 33-33

src/app/endpoints/metrics.py (1)

32-32: LGTM!

Metrics endpoint does not call Llama Stack; scoping the 503 example to "kubernetes api" is consistent with the PR's stated intent of avoiding inconsistent examples for non-Llama-Stack endpoints.

src/app/endpoints/mcp_auth.py (1)

20-20: LGTM!

Handler reads from local configuration only — no Llama Stack call — so the "kubernetes api"-only example correctly represents the auth-middleware 503 path.

Also applies to: 34-34

src/app/endpoints/info.py (1)

32-34: LGTM — the 503 examples match this endpoint’s failure paths.

/info can hit Llama Stack via client.inspect.version() and is also authenticated, so documenting both examples is consistent.

src/app/endpoints/prompts.py (1)

44-46: LGTM — prompt endpoints consistently document both 503 sources.

Each prompt handler reaches Llama Stack and is authenticated, so the "llama stack" and "kubernetes api" examples are appropriate.

Also applies to: 54-56, 66-68, 78-80, 89-91

src/app/endpoints/health.py (1)

59-59: LGTM — liveness only documents the auth/Kubernetes 503 path.

The liveness handler does not call Llama Stack, so keeping this to "kubernetes api" matches the PR objective.

src/app/endpoints/conversations_v1.py (1)

68-70: LGTM — these conversation endpoints access Llama Stack.

The get/delete/update handlers all call Llama Stack APIs and are authenticated, so documenting both 503 examples is consistent.

Also applies to: 83-85, 109-111

src/app/endpoints/vector_stores.py (1)

58-60: LGTM — vector-store 503 examples match the Llama Stack-backed handlers.

These response maps are used by handlers that call Llama Stack and require auth, so both examples are appropriate.

Also applies to: 69-71, 80-82, 91-93, 102-104

src/app/endpoints/mcp_servers.py (1)

41-43: LGTM — MCP 503 examples are scoped correctly.

Register/delete include Llama Stack because they call toolgroup APIs; list stays Kubernetes-only because it only reads local configuration.

Also applies to: 131-131, 182-184

src/app/endpoints/conversations_v2.py (1)

26-78: LGTM!

The ServiceUnavailableResponse import and the 503 entries added to all four response maps are consistent with the PR's goal: since these conversation endpoints interact with the conversation cache (and potentially the Kubernetes API via auth), restricting the 503 example to "kubernetes api" (and excluding "llama stack") correctly avoids the inconsistent example noted in the PR description.

Comment thread src/app/endpoints/conversations_v1.py
Comment thread src/app/endpoints/health.py
@asimurka asimurka force-pushed the refactor_503_responses branch from 5cc7ca2 to 3e2e038 Compare April 22, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant