LCORE-1853: Add relevance cutoff score to BYOK RAG#1525
LCORE-1853: Add relevance cutoff score to BYOK RAG#1525syedriko wants to merge 3 commits intolightspeed-core:mainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 42 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughAdded a per–BYOK inline RAG configuration option Changes
Sequence DiagramsequenceDiagram
participant Client
participant API as API / RAG Layer
participant Config as Configuration
participant VectorSearch as Vector Search
participant Store as Vector Store
participant Extraction as Chunk Extraction
Client->>API: request RAG (query, store ids)
API->>Config: relevance_cutoff_mapping()
Config-->>API: {vector_db_id: cutoff_score, ...}
API->>VectorSearch: fetch BYOK RAG (query, stores, cutoffs)
loop per BYOK store
VectorSearch->>VectorSearch: determine cutoff_score for store
VectorSearch->>Store: query(store, params with score_threshold=cutoff_score)
Store-->>VectorSearch: raw retrieval results (items with raw score)
VectorSearch->>Extraction: extract_byok_rag_chunks(results, cutoff_score)
Extraction->>Extraction: filter out raw score < cutoff_score
Extraction->>Extraction: compute weighted_score (apply score_multiplier)
Extraction-->>VectorSearch: filtered, weighted chunks
end
VectorSearch-->>API: aggregated RAG results
API-->>Client: response with rag_chunks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llama_stack_configuration.py`:
- Around line 44-50: The _raw_byok_rag_store_list function must always return a
real list: when raw_byok_rag is a dict, fetch entries =
raw_byok_rag.get("entries") and then validate its type—if entries is a list
return it, if entries is a dict return [entries] (wrap single-entry dict),
otherwise return [] (handle None or strings). Update _raw_byok_rag_store_list
accordingly so the downstream enrich_byok_rag call can safely iterate and call
brag.get(...) on each item.
In `@src/models/config.py`:
- Around line 1626-1630: The normalizer _normalize_byok_rag_input only converts
list inputs but doesn't handle explicit nulls, causing Pydantic to raise a
dict-type error instead of treating the field as the default; update
_normalize_byok_rag_input to explicitly handle None (e.g., if value is None:
return {} or return a structure matching the default like {"entries": []}), keep
existing behavior for list (return {"entries": value}) and for dicts return
value unchanged so explicit nulls are accepted as the default-friendly shape.
- Around line 1658-1663: The helper _default_byok_rag_section is redundantly
re-specifying defaults already defined on ByokRagSection; simplify by returning
ByokRagSection() (or remove the helper and set default_factory=ByokRagSection on
the byok_rag field of Configuration) so defaults live in one place—update the
_default_byok_rag_section function to return ByokRagSection() or switch the
byok_rag field to use default_factory=ByokRagSection and remove the helper.
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 892-896: The test currently sets
test_config.configuration.byok_rag.relevance_cutoff_score to 0.0 so it doesn't
cover the new rule that raw scores below cutoff must be dropped before applying
score_multiplier; add a new async test (e.g.,
test_query_byok_cutoff_applied_before_multiplier) that sets a non-zero
relevance_cutoff_score (e.g., 0.5), constructs a byok entry with a high
score_multiplier (entry.score_multiplier = 10.0), mocks
AsyncLlamaStackClientHolder and the vector_io.query to return a raw score below
the cutoff (e.g., 0.4), and then calls query_endpoint_handler with QueryRequest
to assert that response.rag_chunks is empty; use existing helpers like
_build_base_mock_client and ensure mock_client.vector_stores.list returns empty
data so the behavior is isolated.
- Around line 268-272: Replace the overly-permissive MagicMock used for
test_config.configuration.byok_rag (_br) with a stricter test double that only
exposes the required attributes (entries and relevance_cutoff_score) to avoid
masking interface regressions; create a simple plain object (e.g., a small
dataclass/Plain object or SimpleNamespace-like fixture) and assign byok_entry to
its entries and 0.0 to its relevance_cutoff_score, then use that object in place
of _br in the test (also update the other occurrences listed: the similar mocks
at lines ~300-304, 434-438, 498-502, 699-703, 818-822, 892-896, 977-981,
1052-1056) so the tests only allow the expected attributes and behaviors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: df544e84-7577-4246-8a5e-eb94d8f8c279
📒 Files selected for processing (17)
docs/byok_guide.mdsrc/app/endpoints/rags.pysrc/client.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/app/endpoints/test_rags.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/telemetry/conftest.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: integration_tests (3.12)
- GitHub Check: integration_tests (3.13)
- GitHub Check: Pyright
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: unit_tests (3.13)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Pylinter
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/client.pysrc/app/endpoints/rags.pysrc/configuration.pysrc/constants.pysrc/utils/responses.pysrc/utils/vector_search.pysrc/llama_stack_configuration.pysrc/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/app/endpoints/test_rags.pytests/unit/telemetry/conftest.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_responses.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_vector_search.pytests/integration/endpoints/test_streaming_query_byok_integration.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/endpoints/test_rags.pytests/unit/telemetry/conftest.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_responses.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_vector_search.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rags.py
tests/**/conftest.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
conftest.pyfor shared pytest fixtures
Files:
tests/unit/telemetry/conftest.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic configuration models
Use type hints:Optional[FilePath],PositiveInt,SecretStrfor configuration fields
Pydantic configuration models should extendConfigurationBase
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Usetyping_extensions.Selffor model validators in type annotations
Pydantic data models should extendBaseModel
IncludeAttributes:section in Pydantic model docstrings
Files:
src/models/config.py
🧠 Learnings (9)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/rags.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pysrc/utils/vector_search.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_byok_rag.pysrc/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🔇 Additional comments (11)
src/app/endpoints/rags.py (1)
166-167: Good alignment withByokRagSection.entries.This keeps
/rags/{rag_id}resolution consistent with the new BYOK config shape.tests/unit/utils/test_responses.py (2)
93-98: Useful test helper for the new BYOK shape.Centralizing section mocking (
entries+ cutoff) improves consistency across test setup.
1654-1656: Nice migration of tests tobyok_ragsection object.These updates correctly track runtime usage of
configuration.configuration.byok_rag.entries.Also applies to: 1677-1677, 1707-1709, 1732-1732, 1757-1759, 1777-1777, 1799-1799, 1824-1824, 3372-3372, 3397-3397, 3434-3434, 3493-3493
src/constants.py (1)
189-190: Good constant centralization for cutoff/threshold defaults.This keeps BYOK and Solr retrieval defaults explicit and reusable.
Also applies to: 195-196
src/configuration.py (1)
482-484: Correct mapping update tobyok_rag.entries.Both ID and score-multiplier mappings now match the new configuration schema.
Also applies to: 509-510
docs/byok_guide.md (1)
82-87: Clear and accurate BYOK cutoff documentation.The updated guide explains scope, ordering, and config shape well, including legacy compatibility.
Also applies to: 284-304, 330-333
tests/unit/models/config/test_byok_rag.py (1)
9-15: Strong config-model coverage for BYOK schema evolution.These tests validate both backward compatibility (legacy list) and cutoff validation constraints.
Also applies to: 195-249, 251-257
src/utils/vector_search.py (1)
75-107: Cutoff propagation and ordering are implemented correctly.The BYOK raw-score filter is applied before weighting and is consistently threaded from config to store queries.
Also applies to: 177-207, 391-420, 546-548
tests/unit/utils/test_vector_search.py (2)
28-33: Good test migration toByokRagSectionshape.Using section-shaped mocks keeps tests aligned with runtime config access (
byok_rag.entries).Also applies to: 427-427, 446-448, 486-488, 528-530, 573-573, 593-593, 703-704, 723-725
109-114: Nice cutoff behavior coverage.These tests verify both low-level chunk filtering and end-to-end BYOK cutoff behavior through
_fetch_byok_rag.Also applies to: 134-139, 145-170, 606-644
src/models/config.py (1)
1958-1966: Migration verification complete: all downstream consumers properly access the new section shape.All references to
configuration.byok_ragcorrectly use.entriesor.relevance_cutoff_score. No sites were found that treatbyok_ragas a direct list via iteration (for rag in config.byok_rag:) or indexing (config.byok_rag[0]). The constantDEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCOREis properly defined and used throughout.
01af5f7 to
7f34e7b
Compare
are-ces
left a comment
There was a problem hiding this comment.
LGTM overall, couple of comments to address 👍
| > vector_db_id: your-index-id # Llama Stack vector store ID (from index generation) | ||
| > db_path: /path/to/vector_db/faiss_store.db | ||
| > score_multiplier: 1.0 # Optional: weight results when mixing multiple sources | ||
| > relevance_cutoff_score: 0.3 # Optional; min raw score per BYOK store before score_multiplier (BYOK only) |
There was a problem hiding this comment.
Wouldn't it be better to give fine control to users by having a per-store relevance_cutoff_score?
Different vector stores can give very different scores (depending on chunking strategy, embedding model, etc..).
| search_response = await client.vector_io.query( | ||
| vector_store_id=vector_store_id, | ||
| query=query, | ||
| params={ |
There was a problem hiding this comment.
Instead of applying the cutoff after the query, why don't we use the llama-stack parameter?
params={
"max_chunks": constants.BYOK_RAG_MAX_CHUNKS,
"mode": "vector",
"score_threshold": byok_raw_cutoff_score,
}
There was a problem hiding this comment.
Also, we could implement the above solution for tool-based RAG. I don't see why it shouldn't support a cutoff score 😄
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/utils/vector_search.py (1)
75-130: 🧹 Nitpick | 🔵 TrivialConsider pushing the cutoff into
vector_io.queryparams (score_threshold) instead of post-filtering.Right now
BYOK_RAG_MAX_CHUNKScandidates are retrieved per store and then filtered client-side, so an aggressive cutoff silently shrinks the returned set even when the store has more above-cutoff chunks available. Passing"score_threshold": byok_raw_cutoff_score(alongsidemax_chunksandmode) lets llama-stack return up tomax_chunksthat already satisfy the threshold and removes the need for the per-chunk filter in_extract_byok_rag_chunks. This was raised previously and is still applicable.Also consider whether Tool RAG (
file_search) should support the same cutoff for consistency, since the configuration is per BYOK store and currently only affects Inline RAG.llama-stack AsyncLlamaStackClient vector_io.query params score_threshold supportedAlso applies to: 178-211
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 75 - 130, The current BYOK cutoff is applied client-side in _extract_byok_rag_chunks by dropping chunks whose score < byok_raw_cutoff_score; instead pass the cutoff into vector_io.query as the "score_threshold" (alongside max_chunks and mode) so the remote query returns up to BYOK_RAG_MAX_CHUNKS already satisfying the threshold, then remove the per-chunk filter and any use of byok_raw_cutoff_score inside _extract_byok_rag_chunks (keep computing weighted_score, doc_id, and logging). Update every call site that queries BYOK stores to include score_threshold=byok_raw_cutoff_score (and ensure AsyncLlamaStackClient.vector_io.query supports it), and consider adding the same score_threshold plumbing for Tool RAG/file_search calls so configuration is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llama_stack_configuration.py`:
- Around line 44-53: The docstring for _raw_byok_rag_store_list is missing the
Google-style Parameters and Returns sections; update the function docstring to
include a "Parameters:" entry describing raw_byok_rag: Any — raw parsed YAML
value (may be list, None, or other types) and a "Returns:" entry specifying
list[Any] — the input list returned as-is when it is a list, otherwise an empty
list (returns [] for None or any non-list); no Raises section is needed because
the function does not raise exceptions.
In `@src/models/config.py`:
- Line 10: Replace the import of Self from typing with the typing_extensions
version: locate the import line that currently reads "from typing import
Annotated, Any, Literal, Optional, Self" in src/models/config.py and change it
to import Self from typing_extensions while keeping the other imports intact so
validators in this module use typing_extensions.Self (i.e., reference Self from
typing_extensions instead of typing).
In `@tests/unit/utils/test_vector_search.py`:
- Around line 28-33: The test helper _byok_rag_list_mock currently assigns a
default relevance_cutoff_score of 0.0 which differs from production; update the
function to set missing or non-numeric entry.relevance_cutoff_score to
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE instead, and ensure the test
module imports that constant (or the module exposing it) so fixtures mirror
production behavior and don't accidentally disable filtering.
---
Duplicate comments:
In `@src/utils/vector_search.py`:
- Around line 75-130: The current BYOK cutoff is applied client-side in
_extract_byok_rag_chunks by dropping chunks whose score < byok_raw_cutoff_score;
instead pass the cutoff into vector_io.query as the "score_threshold" (alongside
max_chunks and mode) so the remote query returns up to BYOK_RAG_MAX_CHUNKS
already satisfying the threshold, then remove the per-chunk filter and any use
of byok_raw_cutoff_score inside _extract_byok_rag_chunks (keep computing
weighted_score, doc_id, and logging). Update every call site that queries BYOK
stores to include score_threshold=byok_raw_cutoff_score (and ensure
AsyncLlamaStackClient.vector_io.query supports it), and consider adding the same
score_threshold plumbing for Tool RAG/file_search calls so configuration is
consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 10198947-364d-426e-883e-1089aa4f87f9
📒 Files selected for processing (15)
docs/byok_guide.mddocs/openapi.jsonexamples/lightspeed-stack-byok-okp-rag.yamlsrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Pylinter
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/test_llama_stack_configuration.pytests/unit/models/config/test_dump_configuration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/test_llama_stack_configuration.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/configuration.pysrc/llama_stack_configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic configuration models
Use type hints:Optional[FilePath],PositiveInt,SecretStrfor configuration fields
Pydantic configuration models should extendConfigurationBase
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Usetyping_extensions.Selffor model validators in type annotations
Pydantic data models should extendBaseModel
IncludeAttributes:section in Pydantic model docstrings
Files:
src/models/config.py
🧠 Learnings (7)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_byok_rag.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pytests/unit/utils/test_vector_search.pysrc/utils/vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🔇 Additional comments (17)
tests/unit/models/config/test_dump_configuration.py (1)
11-11: LGTM — shared default is used consistently.The dump test now verifies that serialized BYOK RAG entries include
relevance_cutoff_scoreusing the centralized default constant.Also applies to: 991-991
src/configuration.py (1)
511-527: LGTM — cutoff mapping matches the existing configuration access pattern.The property is typed, guarded, and consistent with
score_multiplier_mapping, and it supplies the per-store cutoff needed by BYOK retrieval.examples/lightspeed-stack-byok-okp-rag.yaml (1)
43-50: LGTM — example configuration reflects the new per-store cutoff.The example uses the default non-negative cutoff value and makes clear that the setting is scoped to BYOK stores.
tests/unit/test_llama_stack_configuration.py (1)
10-44: LGTM — helper behavior is covered directly.The added tests pin both supported list input and ignored non-list input, matching the new normalization contract.
tests/integration/endpoints/test_query_byok_integration.py (2)
257-266: LGTM — existing scenarios explicitly disable cutoff filtering.Setting
relevance_cutoff_score = 0.0keeps these tests focused on their original behavior while satisfying the new BYOK entry interface.Also applies to: 289-298, 426-432, 493-494, 695-695, 806-812, 879-885, 1021-1021, 1094-1095
942-992: LGTM — regression covers cutoff before weighting.This test uses a raw score below the cutoff with a large multiplier, so it exercises the key rule that filtering happens before
score_multiplieris applied.src/llama_stack_configuration.py (1)
634-634: LGTM — BYOK enrichment now receives normalized raw input.The helper keeps non-list
byok_ragvalues from flowing directly into the enrichment path.tests/integration/endpoints/test_streaming_query_byok_integration.py (1)
239-248: LGTM — streaming fixtures are aligned with the new BYOK field.The zero cutoff preserves prior test behavior while ensuring the mocked BYOK entries expose the new per-store setting.
Also applies to: 266-275, 346-352, 412-418, 700-700, 756-784, 859-865, 948-948, 1025-1026
src/constants.py (1)
189-190: LGTM — BYOK and OKP score defaults are separated clearly.The new BYOK cutoff default is centralized, and the Solr threshold comment avoids conflating OKP/Solr filtering with BYOK raw-score filtering.
Also applies to: 195-196
docs/openapi.json (2)
6936-6936: LGTM!The
operationIdrename aligns the documentation with the actual POST route.
8077-8083: LGTM!Schema entry for
relevance_cutoff_scoreis well-specified (non-negative float, default 0.3) and the description clearly scopes filtering to BYOK inline RAG only.src/models/config.py (3)
1616-1624: LGTM: cutoff field is constrained and documented.The default comes from
constants.py,ge=0enforces the non-negative requirement, and the description clearly states filtering happens before score weighting.
1636-1657: LGTM: normalization keeps the legacy list shape explicit.Handling
nullas[]avoids noisy validation failures, while rejecting top-level mappings prevents ambiguousbyok_ragshapes.
1946-1953: LGTM: validatedbyok_ragwiring is consistent.The field keeps backward-compatible omission behavior with
default_factory=listand documents the cutoff’s BYOK-inline scope.tests/unit/utils/test_vector_search.py (3)
109-114: LGTM: existing extraction tests opt out of cutoff filtering explicitly.Passing
byok_raw_cutoff_score=0.0keeps these tests’ prior expectations stable while exercising the new function signature.Also applies to: 134-139
145-169: LGTM: raw cutoff behavior is covered directly.The test verifies low raw scores are excluded before the multiplier affects the surviving chunk’s weighted score.
609-648: LGTM: BYOK fetch cutoff wiring is tested.This covers the configured cutoff path through
_fetch_byok_ragand confirms only chunks at or above the raw threshold survive.
89c8e2a to
0e12781
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/openapi.json`:
- Line 6936: The OpenAPI spec has duplicate operationId
"handle_a2a_jsonrpc_a2a_post" for both GET and POST on the /a2a path which
breaks client generation; pick unique operationIds (e.g., rename the GET
operation to "handle_a2a_jsonrpc_a2a_get" and keep the POST as
"handle_a2a_jsonrpc_a2a_post" or vice versa), update the operationId fields in
the /a2a GET and POST operations, and then update any server/client code or
routing that references the old operationId to the new name (search for
operationId "handle_a2a_jsonrpc_a2a_post" and the /a2a path to locate spots to
change).
In `@src/configuration.py`:
- Around line 524-527: The dict comprehension over self._configuration.byok_rag
currently overwrites duplicate brag.vector_db_id entries, giving last-wins
semantics; change it to preserve first-wins by iterating the sequence and only
setting mapping[brag.vector_db_id] = brag.relevance_cutoff_score if the key is
not already present (or alternatively validate and raise on duplicates during
configuration validation). Update the code that builds this mapping (the block
returning the dict) to use an explicit loop that checks existence before
assigning so the first encountered vector_db_id and its relevance_cutoff_score
are kept.
In `@src/models/config.py`:
- Around line 1617-1625: The Field relevance_cutoff_score currently allows
non-finite values (like infinity); add a Pydantic validator on
relevance_cutoff_score (e.g., a method validate_relevance_cutoff_score with
`@validator`('relevance_cutoff_score')) that checks math.isfinite(value) and
raises a ValueError if not finite, keeping the existing ge=0 constraint and same
default (constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) so YAML `.inf` or
other non-finite inputs are rejected with a clear validation error.
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 991-992: The test currently asserts response.rag_chunks is not
None and len(response.rag_chunks) == 0, which fails to treat None and []
equivalently; replace those two assertions with a single semantic check like
assert not response.rag_chunks (or assert response.rag_chunks in (None, [])) so
the test passes whether rag_chunks is None or an empty list; update the
assertion that references response.rag_chunks in test_query_byok_integration.py
accordingly.
In `@tests/unit/utils/test_responses.py`:
- Around line 93-100: The helper function _byok_rag_list_mock has an unused
_mocker parameter; remove the unused parameter from the function signature and
update any call sites to stop passing the fixture so the helper only accepts
entries: adjust the definition of _byok_rag_list_mock to accept a single
parameter (entries: list[Any]) and remove the corresponding _mocker argument
from tests that call _byok_rag_list_mock; keep the existing logic that sets
entry.relevance_cutoff_score to
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE when missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c83d827e-592e-4b8a-ab8d-1b9a1a80a959
📒 Files selected for processing (15)
docs/byok_guide.mddocs/openapi.jsonexamples/lightspeed-stack-byok-okp-rag.yamlsrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/models/config/test_dump_configuration.pytests/unit/test_llama_stack_configuration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/models/config/test_dump_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/utils/vector_search.pysrc/models/config.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic configuration models
Use type hints:Optional[FilePath],PositiveInt,SecretStrfor configuration fields
Pydantic configuration models should extendConfigurationBase
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Usetyping_extensions.Selffor model validators in type annotations
Pydantic data models should extendBaseModel
IncludeAttributes:section in Pydantic model docstrings
Files:
src/models/config.py
🧠 Learnings (12)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_byok_rag.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pysrc/utils/vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Include `Parameters:`, `Returns:`, `Raises:` sections in function docstrings as needed
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : All functions require docstrings with brief descriptions
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/utils/test_vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `typing_extensions.Self` for model validators in type annotations
Applied to files:
src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🔇 Additional comments (21)
tests/unit/models/config/test_dump_configuration.py (1)
11-11: LGTM — serialization expectation tracks the shared default.Using
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCOREkeeps the dump test aligned with the config model’s default value.Also applies to: 991-991
tests/unit/test_llama_stack_configuration.py (1)
9-18: LGTM — helper behavior is covered.The tests verify both supported list input and defensive empty-list fallback for non-list
byok_ragvalues.Also applies to: 28-44
examples/lightspeed-stack-byok-okp-rag.yaml (1)
43-43: LGTM — example shows the new per-store cutoff.The BYOK example now demonstrates
relevance_cutoff_scorealongsidescore_multiplier, which makes the raw-cutoff-before-weighting behavior discoverable.Also applies to: 50-50
src/constants.py (1)
189-190: LGTM — cutoff default is centralized separately from OKP/Solr thresholding.The new BYOK default is clearly separated from the existing Solr
score_thresholdconstant.Also applies to: 195-196
src/llama_stack_configuration.py (1)
44-63: LGTM — BYOK enrichment now receives only the supported raw list form.The helper keeps
generate_configuration()from passing non-listbyok_ragshapes into enrichment, and the new unit tests cover the behavior.Also applies to: 644-644
docs/byok_guide.md (1)
82-82: LGTM on docs.The new per-store
relevance_cutoff_scoredocumentation is clear, covers the default, the0.0disable sentinel, the ordering relative toscore_multiplier, and the scope (BYOK inline only, not OKP, not Tool RAG). This also addresses the prior feedback requesting per-store control instead of a global setting.tests/unit/models/config/test_byok_rag.py (1)
196-282: LGTM on new validation tests.Good coverage of the new behavior:
None→[]normalization, list-of-stores parsing with default cutoff application, rejection of the mapping shape, and negative-cutoff rejection on individual entries. Tests usepytest/pytest.raises(ValidationError)per coding guidelines.docs/openapi.json (2)
8077-8083: Schema addition forrelevance_cutoff_scoreis well-specified.Constraints and semantics are clear (
type: number,minimum: 0.0, explicit behavior description, sensible default).
8261-8261: Documentation clarification is accurate and helpful.The updated description clearly communicates list usage and scope of
relevance_cutoff_score.src/models/config.py (3)
10-30: LGTM: validator andSelfimports match the model conventions.
Selfnow comes fromtyping_extensions, andBeforeValidatoris imported for the BYOK list normalizer. Based on learnings, Applies to src/models/**/*.py : Usetyping_extensions.Selffor model validators in type annotations.
1637-1658: LGTM: BYOK input normalization is explicit.
Noneis normalized to an empty list, list inputs are preserved, and invalid top-level shapes fail before element validation.
1947-1953: LGTM:byok_ragschema now reflects the list shape and per-store cutoff.The field type and description align with the new BYOK per-store cutoff behavior.
src/utils/vector_search.py (4)
75-130: LGTM: cutoff is applied before score weighting.The raw
scoreis compared withbyok_raw_cutoff_scorebeforeweighted_scoreis computed, which matches the intended ordering.
178-208: LGTM: BYOK query helper threads the cutoff through cleanly.The helper keeps retrieval separate from post-processing and passes the per-store cutoff into extraction.
351-463: LGTM: per-store cutoff mapping is applied before final ranking.The flow resolves eligible BYOK stores, applies each store’s raw cutoff, then ranks surviving chunks by weighted score.
542-553: LGTM: BYOK and OKP cutoff behavior is documented separately.The docstring makes it clear that BYOK uses
relevance_cutoff_score, while OKP/Solr uses the existing query-parameter threshold path.tests/unit/utils/test_vector_search.py (5)
3-35: LGTM: test fixture now mirrors the production cutoff default.Missing mocked
relevance_cutoff_scorevalues fall back toconstants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE, avoiding accidental test-only filtering disablement.
111-116: LGTM: existing extraction tests preserve prior behavior explicitly.Passing
byok_raw_cutoff_score=0.0keeps these tests focused on metadata and weighting behavior.Also applies to: 136-141
147-171: LGTM: cutoff filtering is covered at extraction level.The test verifies that low raw scores are excluded before the surviving score is multiplied.
427-453: LGTM: BYOK fetch fixtures include cutoff mappings.The updated mocks now provide the new
relevance_cutoff_mappingdependency alongside score and RAG ID mappings.Also applies to: 485-495, 525-537, 576-598, 707-735
611-650: LGTM: fetch-level regression test covers configured cutoff behavior.This verifies
_fetch_byok_ragdrops chunks below the per-store raw cutoff before returning context chunks.
0e12781 to
e371ef4
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/responses.py`:
- Around line 670-715: The code currently combines multiple vector stores into a
single SearchRankingOptions by using max(thresholds) in
file_search_ranking_options_for_vector_stores, which enforces the highest cutoff
across all stores; change the logic to produce one InputToolFileSearch per
vector_store_id with its own SearchRankingOptions.score_threshold (use the
store-specific cutoff: for SOLR use SOLR_VECTOR_SEARCH_DEFAULT_SCORE_THRESHOLD,
for BYOK use br.relevance_cutoff_score, otherwise
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) instead of a global max; also when
translating explicit client-provided file_search tools (the code path that
checks file_tool.ranking_options and calls
file_tool.model_copy(update=updates)), enforce the store-specific cutoff as a
floor (i.e., set score_threshold = max(client_threshold, store_cutoff) per
vector_store_id) and split/emit separate tool instances for each resolved
vector_store_id so each tool call uses the correct per-store ranking_options.
- Line 86: The import is using a submodule path for SearchRankingOptions which
is unsupported; update the import to use the flat root-package import by
replacing the submodule import of SearchRankingOptions in src/utils/responses.py
with a root-level import from llama_stack_api so the module imports
SearchRankingOptions directly from the package root (update the import statement
referencing SearchRankingOptions accordingly).
In `@src/utils/vector_search.py`:
- Around line 399-419: The relevance cutoff fallback is unreachable and should
be removed: in the loop that calls _query_store_for_byok_rag, replace
relevance_cutoff_mapping.get(vector_store_id,
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) with a direct lookup
relevance_cutoff_mapping[vector_store_id] (since vector_store_ids_to_query is
derived from configuration.byok_rag via resolve_vector_store_ids); this makes
intent explicit and allows misconfigurations to surface as errors instead of
silently using the default.
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 968-991: The test's vector_io.query mock currently always returns
a low-score chunk (0.4) which ignores the production contract of honoring
score_threshold; change the mock so mock_client.vector_io.query is an AsyncMock
with a side_effect that reads the incoming score_threshold kwarg/arg and returns
the low_score_resp only when score_threshold <= 0.4, otherwise returns an
empty/_make_vector_io_response(..., []) result; use the existing
_make_vector_io_response helper to construct both responses and keep references
to mock_client.vector_io.query, _make_vector_io_response, and QueryRequest to
locate and implement the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 16fb90e9-2655-4a90-8099-91f1b6d7249d
📒 Files selected for processing (18)
docs/byok_guide.mddocs/openapi.jsonexamples/lightspeed-stack-byok-okp-rag.yamlsrc/app/endpoints/a2a.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (6)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async unit tests
Files:
tests/unit/test_llama_stack_configuration.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_vector_search.pytests/unit/utils/test_responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/test_llama_stack_configuration.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_vector_search.pytests/unit/utils/test_responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/utils/responses.pysrc/models/config.pysrc/utils/vector_search.pysrc/app/endpoints/a2a.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic configuration models
Use type hints:Optional[FilePath],PositiveInt,SecretStrfor configuration fields
Pydantic configuration models should extendConfigurationBase
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Usetyping_extensions.Selffor model validators in type annotations
Pydantic data models should extendBaseModel
IncludeAttributes:section in Pydantic model docstrings
Files:
src/models/config.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/a2a.py
🧠 Learnings (17)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_byok_rag.pysrc/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields
Applied to files:
tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pydocs/byok_guide.mdsrc/utils/vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Include `Parameters:`, `Returns:`, `Raises:` sections in function docstrings as needed
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : All functions require docstrings with brief descriptions
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/utils/test_vector_search.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/utils/responses.pytests/unit/utils/test_responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports
Applied to files:
src/utils/responses.pytests/unit/utils/test_responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `typing_extensions.Self` for model validators in type annotations
Applied to files:
src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/a2a.py
🪛 GitHub Actions: Integration tests
tests/integration/endpoints/test_query_byok_integration.py
[error] 991-991: Test failed: cutoff should be applied before score_multiplier. AssertionError: expected response.rag_chunks to be empty, but got [RAGChunk(content='Content below raw cutoff', source='cutoff-test', score=4.0, attributes={'document_id': 'doc-below'})].
[warning] 1-1: ResourceWarning: unclosed database in sqlite3.Connection object (reported from logging/init.py).
🔇 Additional comments (18)
tests/unit/models/config/test_dump_configuration.py (1)
11-11: LGTM: serialization now checks the centralized BYOK cutoff default.Using the shared constant keeps the dump expectation aligned with the config model default. Based on learnings, Central
constants.pyfor shared constants with descriptive comments.Also applies to: 991-991
examples/lightspeed-stack-byok-okp-rag.yaml (1)
43-50: LGTM: the BYOK example now shows the new cutoff setting.Both sample stores include the per-store cutoff, and the inline comment correctly documents that it is applied before
score_multiplier.tests/integration/endpoints/test_query_byok_integration.py (1)
253-267: LGTM: explicit zero cutoffs preserve existing test behavior.These fixtures keep non-cutoff-focused tests exercising the pre-existing BYOK behavior while satisfying the new config shape.
Also applies to: 285-299, 422-432, 493-494, 691-695, 802-812, 875-885, 1016-1020, 1093-1094
tests/unit/test_llama_stack_configuration.py (1)
9-18: LGTM: BYOK raw-list normalization is covered.The tests verify both accepted list input and safe fallback for non-list forms.
Also applies to: 28-45
tests/integration/endpoints/test_streaming_query_byok_integration.py (1)
235-248: LGTM: streaming BYOK mocks include the new cutoff field.Using
0.0keeps these existing streaming scenarios focused on their original behavior while matching the updated BYOK entry interface.Also applies to: 262-275, 342-352, 408-418, 696-700, 756-784, 855-865, 944-948, 1025-1026
tests/unit/test_configuration.py (1)
1189-1225: LGTM: duplicate cutoff mapping behavior is covered.This test locks in first-wins semantics for repeated
vector_db_idvalues.src/configuration.py (1)
511-531: LGTM: cutoff mapping preserves first-wins semantics.The explicit loop avoids overwriting duplicate
vector_db_identries and matches the documented behavior.src/constants.py (1)
189-196: LGTM: the cutoff default is centralized with related RAG constants.The Solr threshold comment also clarifies scope without changing behavior.
src/utils/vector_search.py (1)
188-197: Score_threshold is properly supported across all BYOK vector_io providers.The
score_thresholdparameter is natively supported by all three BYOK-compatible providers—inline::faiss,inline::sqlite-vec, andremote::pgvector—with consistent semantics across the unified VectorIORouter. Each provider treats it as a post-filter threshold on normalized similarity scores (higher = better match), ensuring that moving the cutoff server-side viaparams["score_threshold"]applies uniformly to every BYOK query without risk of silent bypass or provider-specific inconsistency.docs/openapi.json (4)
6934-6936: LGTM — GET operation is now distinct.The GET metadata clearly describes the
/a2aGET handling, and theoperationIdis unique from POST in the shown spec.
6952-6954: LGTM — POST operationId is unique.This addresses the client-generation risk from duplicate
/a2aoperation IDs.
8077-8083: LGTM — cutoff schema is clear and constrained.The new field documents the raw-score cutoff semantics, non-negative bound, default, and BYOK-only scope.
8261-8261: LGTM — parent schema description matches the per-store model.Calling out
byok_rag: [ ... ]reduces ambiguity about whererelevance_cutoff_scorebelongs.tests/unit/utils/test_vector_search.py (2)
28-35: LGTM — fixture now mirrors the production cutoff default.The helper keeps mocked BYOK entries aligned with the configured default instead of silently disabling filtering.
611-646: LGTM — cutoff forwarding is covered.This test verifies that the configured BYOK cutoff is sent to
vector_io.queryasscore_threshold, which matches the feature path.src/models/config.py (2)
1618-1649: LGTM — cutoff validation is appropriately bounded.The field enforces non-negative values and rejects
inf/nan, preventing silent “drop everything” configurations.
1661-1682: LGTM — BYOK input normalization is explicit.Handling
nullas an empty list while rejecting top-level mappings keeps the accepted YAML shape clear.src/app/endpoints/a2a.py (1)
695-729: LGTM — route split keeps behavior shared while fixing operation IDs.Both GET and POST preserve authorization and delegate into the common JSON-RPC handler.
c535d8e to
579c17e
Compare
Description
In the Lightspeed Stack configuration, byok_rag section, there is a configuraiton variable called relevance_cutoff_score, a non-negative float.
When a RAG chunk is retrieved from a BYOK database and its relevance score is less then the cutoff, the chunk is dropped from further consideration. Specifically, this cutoff filtering happens before score multipliers are applied.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Behavior Changes
Documentation