Skip to content

LCORE-1853: Add relevance cutoff score to BYOK RAG#1525

Draft
syedriko wants to merge 3 commits intolightspeed-core:mainfrom
syedriko:syedriko-lcore-1853
Draft

LCORE-1853: Add relevance cutoff score to BYOK RAG#1525
syedriko wants to merge 3 commits intolightspeed-core:mainfrom
syedriko:syedriko-lcore-1853

Conversation

@syedriko
Copy link
Copy Markdown
Contributor

@syedriko syedriko commented Apr 17, 2026

Description

In the Lightspeed Stack configuration, byok_rag section, there is a configuraiton variable called relevance_cutoff_score, a non-negative float.
When a RAG chunk is retrieved from a BYOK database and its relevance score is less then the cutoff, the chunk is dropped from further consideration. Specifically, this cutoff filtering happens before score multipliers are applied.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Per-store relevance_cutoff_score for BYOK inline RAG to drop low-similarity chunks before weighting (default 0.3, 0.0 disables). Applies to BYOK only and is enforced before score_multiplier.
  • Behavior Changes

    • File-search tools now receive ranking_options derived from BYOK cutoffs (single tool threshold = max cutoff across selected stores).
  • Documentation

    • Guides, examples, and OpenAPI updated to reflect cutoff field, defaults, YAML shape, and BYOK-only scope; A2A endpoint docs split GET vs POST.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Warning

Rate limit exceeded

@syedriko has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 42 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 42 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bc498a9e-891f-42af-b7e1-13b9214b3727

📥 Commits

Reviewing files that changed from the base of the PR and between c535d8e and 579c17e.

📒 Files selected for processing (5)
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py

Walkthrough

Added a per–BYOK inline RAG configuration option relevance_cutoff_score that filters retrieved BYOK chunks by raw similarity before applying score_multiplier. Changes touch configuration schema/validation, config mapping, vector-search filtering, docs/OpenAPI, examples, and tests.

Changes

Cohort / File(s) Summary
Docs & Examples
docs/byok_guide.md, docs/openapi.json, examples/lightspeed-stack-byok-okp-rag.yaml
Documented new relevance_cutoff_score (minimum 0.0, default 0.3), updated examples and RAG comparison table, and added OpenAPI schema property.
Constants & Config API
src/constants.py, src/configuration.py
Added DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE = 0.3 and AppConfig.relevance_cutoff_mapping() to expose per-store cutoff mapping (first-seen wins for duplicate vector_db_id).
Configuration Schema & Normalization
src/models/config.py
Added ByokRag.relevance_cutoff_score: float (ge=0, default constant) with validator rejecting non-finite values. Introduced _normalize_byok_rag_input and ByokRagListValidated to normalize byok_rag input (null[], reject non-list non-null).
Configuration Processing
src/llama_stack_configuration.py
Added _raw_byok_rag_store_list(raw_byok_rag) to ensure enrichment only when the raw YAML byok_rag is a list; generate_configuration() now uses it.
Vector Search Implementation
src/utils/vector_search.py
Threaded per-store relevance_cutoff_score into store queries as score_threshold; _extract_byok_rag_chunks filters out chunks below cutoff before applying weight/score_multiplier. OKP/Solr thresholds remain handled separately.
Tool / Response Helpers
src/utils/responses.py
Changed get_rag_tools(...) signature to accept byok_rags, always set file_search.ranking_options using file_search_ranking_options_for_vector_stores(...), and preserve client-provided ranking_options when present.
A2A Endpoint Routing
src/app/endpoints/a2a.py
Split combined /a2a handler into separate GET and POST handlers delegating to new internal _handle_a2a_jsonrpc helper; updated operation IDs.
Integration Tests
tests/integration/endpoints/test_query_byok_integration.py, tests/integration/endpoints/test_streaming_query_byok_integration.py
Extended BYOK fixtures to include relevance_cutoff_score and added integration test asserting cutoff is applied before score_multiplier.
Unit Tests
tests/unit/models/config/test_byok_rag.py, tests/unit/models/config/test_dump_configuration.py, tests/unit/test_llama_stack_configuration.py, tests/unit/utils/test_vector_search.py, tests/unit/utils/test_responses.py, tests/unit/test_configuration.py
Added/updated tests for default value, validation (negative/non-finite), input normalization, serialization, _raw_byok_rag_store_list behavior, per-store cutoff mapping semantics, chunk filtering, and helpers to normalize mocked BYOK entries.
Minor / Misc
docs/openapi.json
Fixed separate operationId values for /a2a GET vs POST in OpenAPI and added schema docs for relevance_cutoff_score.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as API / RAG Layer
    participant Config as Configuration
    participant VectorSearch as Vector Search
    participant Store as Vector Store
    participant Extraction as Chunk Extraction

    Client->>API: request RAG (query, store ids)
    API->>Config: relevance_cutoff_mapping()
    Config-->>API: {vector_db_id: cutoff_score, ...}
    API->>VectorSearch: fetch BYOK RAG (query, stores, cutoffs)

    loop per BYOK store
        VectorSearch->>VectorSearch: determine cutoff_score for store
        VectorSearch->>Store: query(store, params with score_threshold=cutoff_score)
        Store-->>VectorSearch: raw retrieval results (items with raw score)
        VectorSearch->>Extraction: extract_byok_rag_chunks(results, cutoff_score)
        Extraction->>Extraction: filter out raw score < cutoff_score
        Extraction->>Extraction: compute weighted_score (apply score_multiplier)
        Extraction-->>VectorSearch: filtered, weighted chunks
    end

    VectorSearch-->>API: aggregated RAG results
    API-->>Client: response with rag_chunks
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main feature being added: a relevance cutoff score configuration for BYOK RAG, which is the primary change throughout the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llama_stack_configuration.py`:
- Around line 44-50: The _raw_byok_rag_store_list function must always return a
real list: when raw_byok_rag is a dict, fetch entries =
raw_byok_rag.get("entries") and then validate its type—if entries is a list
return it, if entries is a dict return [entries] (wrap single-entry dict),
otherwise return [] (handle None or strings). Update _raw_byok_rag_store_list
accordingly so the downstream enrich_byok_rag call can safely iterate and call
brag.get(...) on each item.

In `@src/models/config.py`:
- Around line 1626-1630: The normalizer _normalize_byok_rag_input only converts
list inputs but doesn't handle explicit nulls, causing Pydantic to raise a
dict-type error instead of treating the field as the default; update
_normalize_byok_rag_input to explicitly handle None (e.g., if value is None:
return {} or return a structure matching the default like {"entries": []}), keep
existing behavior for list (return {"entries": value}) and for dicts return
value unchanged so explicit nulls are accepted as the default-friendly shape.
- Around line 1658-1663: The helper _default_byok_rag_section is redundantly
re-specifying defaults already defined on ByokRagSection; simplify by returning
ByokRagSection() (or remove the helper and set default_factory=ByokRagSection on
the byok_rag field of Configuration) so defaults live in one place—update the
_default_byok_rag_section function to return ByokRagSection() or switch the
byok_rag field to use default_factory=ByokRagSection and remove the helper.

In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 892-896: The test currently sets
test_config.configuration.byok_rag.relevance_cutoff_score to 0.0 so it doesn't
cover the new rule that raw scores below cutoff must be dropped before applying
score_multiplier; add a new async test (e.g.,
test_query_byok_cutoff_applied_before_multiplier) that sets a non-zero
relevance_cutoff_score (e.g., 0.5), constructs a byok entry with a high
score_multiplier (entry.score_multiplier = 10.0), mocks
AsyncLlamaStackClientHolder and the vector_io.query to return a raw score below
the cutoff (e.g., 0.4), and then calls query_endpoint_handler with QueryRequest
to assert that response.rag_chunks is empty; use existing helpers like
_build_base_mock_client and ensure mock_client.vector_stores.list returns empty
data so the behavior is isolated.
- Around line 268-272: Replace the overly-permissive MagicMock used for
test_config.configuration.byok_rag (_br) with a stricter test double that only
exposes the required attributes (entries and relevance_cutoff_score) to avoid
masking interface regressions; create a simple plain object (e.g., a small
dataclass/Plain object or SimpleNamespace-like fixture) and assign byok_entry to
its entries and 0.0 to its relevance_cutoff_score, then use that object in place
of _br in the test (also update the other occurrences listed: the similar mocks
at lines ~300-304, 434-438, 498-502, 699-703, 818-822, 892-896, 977-981,
1052-1056) so the tests only allow the expected attributes and behaviors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: df544e84-7577-4246-8a5e-eb94d8f8c279

📥 Commits

Reviewing files that changed from the base of the PR and between 0f95aca and 01af5f7.

📒 Files selected for processing (17)
  • docs/byok_guide.md
  • src/app/endpoints/rags.py
  • src/client.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: Pyright
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Pylinter
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/client.py
  • src/app/endpoints/rags.py
  • src/configuration.py
  • src/constants.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • src/llama_stack_configuration.py
  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/telemetry/conftest.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_vector_search.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_vector_search.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/rags.py
tests/**/conftest.py

📄 CodeRabbit inference engine (AGENTS.md)

Use conftest.py for shared pytest fixtures

Files:

  • tests/unit/telemetry/conftest.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/config.py
🧠 Learnings (9)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rags.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • src/utils/vector_search.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • src/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🔇 Additional comments (11)
src/app/endpoints/rags.py (1)

166-167: Good alignment with ByokRagSection.entries.

This keeps /rags/{rag_id} resolution consistent with the new BYOK config shape.

tests/unit/utils/test_responses.py (2)

93-98: Useful test helper for the new BYOK shape.

Centralizing section mocking (entries + cutoff) improves consistency across test setup.


1654-1656: Nice migration of tests to byok_rag section object.

These updates correctly track runtime usage of configuration.configuration.byok_rag.entries.

Also applies to: 1677-1677, 1707-1709, 1732-1732, 1757-1759, 1777-1777, 1799-1799, 1824-1824, 3372-3372, 3397-3397, 3434-3434, 3493-3493

src/constants.py (1)

189-190: Good constant centralization for cutoff/threshold defaults.

This keeps BYOK and Solr retrieval defaults explicit and reusable.

Also applies to: 195-196

src/configuration.py (1)

482-484: Correct mapping update to byok_rag.entries.

Both ID and score-multiplier mappings now match the new configuration schema.

Also applies to: 509-510

docs/byok_guide.md (1)

82-87: Clear and accurate BYOK cutoff documentation.

The updated guide explains scope, ordering, and config shape well, including legacy compatibility.

Also applies to: 284-304, 330-333

tests/unit/models/config/test_byok_rag.py (1)

9-15: Strong config-model coverage for BYOK schema evolution.

These tests validate both backward compatibility (legacy list) and cutoff validation constraints.

Also applies to: 195-249, 251-257

src/utils/vector_search.py (1)

75-107: Cutoff propagation and ordering are implemented correctly.

The BYOK raw-score filter is applied before weighting and is consistently threaded from config to store queries.

Also applies to: 177-207, 391-420, 546-548

tests/unit/utils/test_vector_search.py (2)

28-33: Good test migration to ByokRagSection shape.

Using section-shaped mocks keeps tests aligned with runtime config access (byok_rag.entries).

Also applies to: 427-427, 446-448, 486-488, 528-530, 573-573, 593-593, 703-704, 723-725


109-114: Nice cutoff behavior coverage.

These tests verify both low-level chunk filtering and end-to-end BYOK cutoff behavior through _fetch_byok_rag.

Also applies to: 134-139, 145-170, 606-644

src/models/config.py (1)

1958-1966: Migration verification complete: all downstream consumers properly access the new section shape.

All references to configuration.byok_rag correctly use .entries or .relevance_cutoff_score. No sites were found that treat byok_rag as a direct list via iteration (for rag in config.byok_rag:) or indexing (config.byok_rag[0]). The constant DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE is properly defined and used throughout.

Comment thread src/models/config.py Outdated
Comment thread src/models/config.py Outdated
Comment thread tests/integration/endpoints/test_query_byok_integration.py Outdated
Comment thread tests/integration/endpoints/test_query_byok_integration.py Outdated
@syedriko syedriko force-pushed the syedriko-lcore-1853 branch from 01af5f7 to 7f34e7b Compare April 17, 2026 03:15
Copy link
Copy Markdown
Contributor

@are-ces are-ces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, couple of comments to address 👍

Comment thread docs/byok_guide.md Outdated
> vector_db_id: your-index-id # Llama Stack vector store ID (from index generation)
> db_path: /path/to/vector_db/faiss_store.db
> score_multiplier: 1.0 # Optional: weight results when mixing multiple sources
> relevance_cutoff_score: 0.3 # Optional; min raw score per BYOK store before score_multiplier (BYOK only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to give fine control to users by having a per-store relevance_cutoff_score?

Different vector stores can give very different scores (depending on chunking strategy, embedding model, etc..).

search_response = await client.vector_io.query(
vector_store_id=vector_store_id,
query=query,
params={
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of applying the cutoff after the query, why don't we use the llama-stack parameter?

params={
    "max_chunks": constants.BYOK_RAG_MAX_CHUNKS,
    "mode": "vector",
    "score_threshold": byok_raw_cutoff_score,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we could implement the above solution for tool-based RAG. I don't see why it shouldn't support a cutoff score 😄

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Apr 17, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 Security concerns

followed all instructions
No

⚡ No major issues detected

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/utils/vector_search.py (1)

75-130: 🧹 Nitpick | 🔵 Trivial

Consider pushing the cutoff into vector_io.query params (score_threshold) instead of post-filtering.

Right now BYOK_RAG_MAX_CHUNKS candidates are retrieved per store and then filtered client-side, so an aggressive cutoff silently shrinks the returned set even when the store has more above-cutoff chunks available. Passing "score_threshold": byok_raw_cutoff_score (alongside max_chunks and mode) lets llama-stack return up to max_chunks that already satisfy the threshold and removes the need for the per-chunk filter in _extract_byok_rag_chunks. This was raised previously and is still applicable.

Also consider whether Tool RAG (file_search) should support the same cutoff for consistency, since the configuration is per BYOK store and currently only affects Inline RAG.

llama-stack AsyncLlamaStackClient vector_io.query params score_threshold supported

Also applies to: 178-211

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 75 - 130, The current BYOK cutoff is
applied client-side in _extract_byok_rag_chunks by dropping chunks whose score <
byok_raw_cutoff_score; instead pass the cutoff into vector_io.query as the
"score_threshold" (alongside max_chunks and mode) so the remote query returns up
to BYOK_RAG_MAX_CHUNKS already satisfying the threshold, then remove the
per-chunk filter and any use of byok_raw_cutoff_score inside
_extract_byok_rag_chunks (keep computing weighted_score, doc_id, and logging).
Update every call site that queries BYOK stores to include
score_threshold=byok_raw_cutoff_score (and ensure
AsyncLlamaStackClient.vector_io.query supports it), and consider adding the same
score_threshold plumbing for Tool RAG/file_search calls so configuration is
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llama_stack_configuration.py`:
- Around line 44-53: The docstring for _raw_byok_rag_store_list is missing the
Google-style Parameters and Returns sections; update the function docstring to
include a "Parameters:" entry describing raw_byok_rag: Any — raw parsed YAML
value (may be list, None, or other types) and a "Returns:" entry specifying
list[Any] — the input list returned as-is when it is a list, otherwise an empty
list (returns [] for None or any non-list); no Raises section is needed because
the function does not raise exceptions.

In `@src/models/config.py`:
- Line 10: Replace the import of Self from typing with the typing_extensions
version: locate the import line that currently reads "from typing import
Annotated, Any, Literal, Optional, Self" in src/models/config.py and change it
to import Self from typing_extensions while keeping the other imports intact so
validators in this module use typing_extensions.Self (i.e., reference Self from
typing_extensions instead of typing).

In `@tests/unit/utils/test_vector_search.py`:
- Around line 28-33: The test helper _byok_rag_list_mock currently assigns a
default relevance_cutoff_score of 0.0 which differs from production; update the
function to set missing or non-numeric entry.relevance_cutoff_score to
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE instead, and ensure the test
module imports that constant (or the module exposing it) so fixtures mirror
production behavior and don't accidentally disable filtering.

---

Duplicate comments:
In `@src/utils/vector_search.py`:
- Around line 75-130: The current BYOK cutoff is applied client-side in
_extract_byok_rag_chunks by dropping chunks whose score < byok_raw_cutoff_score;
instead pass the cutoff into vector_io.query as the "score_threshold" (alongside
max_chunks and mode) so the remote query returns up to BYOK_RAG_MAX_CHUNKS
already satisfying the threshold, then remove the per-chunk filter and any use
of byok_raw_cutoff_score inside _extract_byok_rag_chunks (keep computing
weighted_score, doc_id, and logging). Update every call site that queries BYOK
stores to include score_threshold=byok_raw_cutoff_score (and ensure
AsyncLlamaStackClient.vector_io.query supports it), and consider adding the same
score_threshold plumbing for Tool RAG/file_search calls so configuration is
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10198947-364d-426e-883e-1089aa4f87f9

📥 Commits

Reviewing files that changed from the base of the PR and between 01af5f7 and 89c8e2a.

📒 Files selected for processing (15)
  • docs/byok_guide.md
  • docs/openapi.json
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/configuration.py
  • src/llama_stack_configuration.py
  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/config.py
🧠 Learnings (7)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • tests/unit/utils/test_vector_search.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🔇 Additional comments (17)
tests/unit/models/config/test_dump_configuration.py (1)

11-11: LGTM — shared default is used consistently.

The dump test now verifies that serialized BYOK RAG entries include relevance_cutoff_score using the centralized default constant.

Also applies to: 991-991

src/configuration.py (1)

511-527: LGTM — cutoff mapping matches the existing configuration access pattern.

The property is typed, guarded, and consistent with score_multiplier_mapping, and it supplies the per-store cutoff needed by BYOK retrieval.

examples/lightspeed-stack-byok-okp-rag.yaml (1)

43-50: LGTM — example configuration reflects the new per-store cutoff.

The example uses the default non-negative cutoff value and makes clear that the setting is scoped to BYOK stores.

tests/unit/test_llama_stack_configuration.py (1)

10-44: LGTM — helper behavior is covered directly.

The added tests pin both supported list input and ignored non-list input, matching the new normalization contract.

tests/integration/endpoints/test_query_byok_integration.py (2)

257-266: LGTM — existing scenarios explicitly disable cutoff filtering.

Setting relevance_cutoff_score = 0.0 keeps these tests focused on their original behavior while satisfying the new BYOK entry interface.

Also applies to: 289-298, 426-432, 493-494, 695-695, 806-812, 879-885, 1021-1021, 1094-1095


942-992: LGTM — regression covers cutoff before weighting.

This test uses a raw score below the cutoff with a large multiplier, so it exercises the key rule that filtering happens before score_multiplier is applied.

src/llama_stack_configuration.py (1)

634-634: LGTM — BYOK enrichment now receives normalized raw input.

The helper keeps non-list byok_rag values from flowing directly into the enrichment path.

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

239-248: LGTM — streaming fixtures are aligned with the new BYOK field.

The zero cutoff preserves prior test behavior while ensuring the mocked BYOK entries expose the new per-store setting.

Also applies to: 266-275, 346-352, 412-418, 700-700, 756-784, 859-865, 948-948, 1025-1026

src/constants.py (1)

189-190: LGTM — BYOK and OKP score defaults are separated clearly.

The new BYOK cutoff default is centralized, and the Solr threshold comment avoids conflating OKP/Solr filtering with BYOK raw-score filtering.

Also applies to: 195-196

docs/openapi.json (2)

6936-6936: LGTM!

The operationId rename aligns the documentation with the actual POST route.


8077-8083: LGTM!

Schema entry for relevance_cutoff_score is well-specified (non-negative float, default 0.3) and the description clearly scopes filtering to BYOK inline RAG only.

src/models/config.py (3)

1616-1624: LGTM: cutoff field is constrained and documented.

The default comes from constants.py, ge=0 enforces the non-negative requirement, and the description clearly states filtering happens before score weighting.


1636-1657: LGTM: normalization keeps the legacy list shape explicit.

Handling null as [] avoids noisy validation failures, while rejecting top-level mappings prevents ambiguous byok_rag shapes.


1946-1953: LGTM: validated byok_rag wiring is consistent.

The field keeps backward-compatible omission behavior with default_factory=list and documents the cutoff’s BYOK-inline scope.

tests/unit/utils/test_vector_search.py (3)

109-114: LGTM: existing extraction tests opt out of cutoff filtering explicitly.

Passing byok_raw_cutoff_score=0.0 keeps these tests’ prior expectations stable while exercising the new function signature.

Also applies to: 134-139


145-169: LGTM: raw cutoff behavior is covered directly.

The test verifies low raw scores are excluded before the multiplier affects the surviving chunk’s weighted score.


609-648: LGTM: BYOK fetch cutoff wiring is tested.

This covers the configured cutoff path through _fetch_byok_rag and confirms only chunks at or above the raw threshold survive.

Comment thread src/llama_stack_configuration.py
Comment thread src/models/config.py Outdated
Comment thread tests/unit/utils/test_vector_search.py
@syedriko syedriko force-pushed the syedriko-lcore-1853 branch from 89c8e2a to 0e12781 Compare April 17, 2026 23:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/openapi.json`:
- Line 6936: The OpenAPI spec has duplicate operationId
"handle_a2a_jsonrpc_a2a_post" for both GET and POST on the /a2a path which
breaks client generation; pick unique operationIds (e.g., rename the GET
operation to "handle_a2a_jsonrpc_a2a_get" and keep the POST as
"handle_a2a_jsonrpc_a2a_post" or vice versa), update the operationId fields in
the /a2a GET and POST operations, and then update any server/client code or
routing that references the old operationId to the new name (search for
operationId "handle_a2a_jsonrpc_a2a_post" and the /a2a path to locate spots to
change).

In `@src/configuration.py`:
- Around line 524-527: The dict comprehension over self._configuration.byok_rag
currently overwrites duplicate brag.vector_db_id entries, giving last-wins
semantics; change it to preserve first-wins by iterating the sequence and only
setting mapping[brag.vector_db_id] = brag.relevance_cutoff_score if the key is
not already present (or alternatively validate and raise on duplicates during
configuration validation). Update the code that builds this mapping (the block
returning the dict) to use an explicit loop that checks existence before
assigning so the first encountered vector_db_id and its relevance_cutoff_score
are kept.

In `@src/models/config.py`:
- Around line 1617-1625: The Field relevance_cutoff_score currently allows
non-finite values (like infinity); add a Pydantic validator on
relevance_cutoff_score (e.g., a method validate_relevance_cutoff_score with
`@validator`('relevance_cutoff_score')) that checks math.isfinite(value) and
raises a ValueError if not finite, keeping the existing ge=0 constraint and same
default (constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) so YAML `.inf` or
other non-finite inputs are rejected with a clear validation error.

In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 991-992: The test currently asserts response.rag_chunks is not
None and len(response.rag_chunks) == 0, which fails to treat None and []
equivalently; replace those two assertions with a single semantic check like
assert not response.rag_chunks (or assert response.rag_chunks in (None, [])) so
the test passes whether rag_chunks is None or an empty list; update the
assertion that references response.rag_chunks in test_query_byok_integration.py
accordingly.

In `@tests/unit/utils/test_responses.py`:
- Around line 93-100: The helper function _byok_rag_list_mock has an unused
_mocker parameter; remove the unused parameter from the function signature and
update any call sites to stop passing the fixture so the helper only accepts
entries: adjust the definition of _byok_rag_list_mock to accept a single
parameter (entries: list[Any]) and remove the corresponding _mocker argument
from tests that call _byok_rag_list_mock; keep the existing logic that sets
entry.relevance_cutoff_score to
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE when missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c83d827e-592e-4b8a-ab8d-1b9a1a80a959

📥 Commits

Reviewing files that changed from the base of the PR and between 89c8e2a and 0e12781.

📒 Files selected for processing (15)
  • docs/byok_guide.md
  • docs/openapi.json
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/utils/vector_search.py
  • src/models/config.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/config.py
🧠 Learnings (12)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Include `Parameters:`, `Returns:`, `Raises:` sections in function docstrings as needed

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : All functions require docstrings with brief descriptions

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `typing_extensions.Self` for model validators in type annotations

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🔇 Additional comments (21)
tests/unit/models/config/test_dump_configuration.py (1)

11-11: LGTM — serialization expectation tracks the shared default.

Using constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE keeps the dump test aligned with the config model’s default value.

Also applies to: 991-991

tests/unit/test_llama_stack_configuration.py (1)

9-18: LGTM — helper behavior is covered.

The tests verify both supported list input and defensive empty-list fallback for non-list byok_rag values.

Also applies to: 28-44

examples/lightspeed-stack-byok-okp-rag.yaml (1)

43-43: LGTM — example shows the new per-store cutoff.

The BYOK example now demonstrates relevance_cutoff_score alongside score_multiplier, which makes the raw-cutoff-before-weighting behavior discoverable.

Also applies to: 50-50

src/constants.py (1)

189-190: LGTM — cutoff default is centralized separately from OKP/Solr thresholding.

The new BYOK default is clearly separated from the existing Solr score_threshold constant.

Also applies to: 195-196

src/llama_stack_configuration.py (1)

44-63: LGTM — BYOK enrichment now receives only the supported raw list form.

The helper keeps generate_configuration() from passing non-list byok_rag shapes into enrichment, and the new unit tests cover the behavior.

Also applies to: 644-644

docs/byok_guide.md (1)

82-82: LGTM on docs.

The new per-store relevance_cutoff_score documentation is clear, covers the default, the 0.0 disable sentinel, the ordering relative to score_multiplier, and the scope (BYOK inline only, not OKP, not Tool RAG). This also addresses the prior feedback requesting per-store control instead of a global setting.

tests/unit/models/config/test_byok_rag.py (1)

196-282: LGTM on new validation tests.

Good coverage of the new behavior: None[] normalization, list-of-stores parsing with default cutoff application, rejection of the mapping shape, and negative-cutoff rejection on individual entries. Tests use pytest/pytest.raises(ValidationError) per coding guidelines.

docs/openapi.json (2)

8077-8083: Schema addition for relevance_cutoff_score is well-specified.

Constraints and semantics are clear (type: number, minimum: 0.0, explicit behavior description, sensible default).


8261-8261: Documentation clarification is accurate and helpful.

The updated description clearly communicates list usage and scope of relevance_cutoff_score.

src/models/config.py (3)

10-30: LGTM: validator and Self imports match the model conventions.

Self now comes from typing_extensions, and BeforeValidator is imported for the BYOK list normalizer. Based on learnings, Applies to src/models/**/*.py : Use typing_extensions.Self for model validators in type annotations.


1637-1658: LGTM: BYOK input normalization is explicit.

None is normalized to an empty list, list inputs are preserved, and invalid top-level shapes fail before element validation.


1947-1953: LGTM: byok_rag schema now reflects the list shape and per-store cutoff.

The field type and description align with the new BYOK per-store cutoff behavior.

src/utils/vector_search.py (4)

75-130: LGTM: cutoff is applied before score weighting.

The raw score is compared with byok_raw_cutoff_score before weighted_score is computed, which matches the intended ordering.


178-208: LGTM: BYOK query helper threads the cutoff through cleanly.

The helper keeps retrieval separate from post-processing and passes the per-store cutoff into extraction.


351-463: LGTM: per-store cutoff mapping is applied before final ranking.

The flow resolves eligible BYOK stores, applies each store’s raw cutoff, then ranks surviving chunks by weighted score.


542-553: LGTM: BYOK and OKP cutoff behavior is documented separately.

The docstring makes it clear that BYOK uses relevance_cutoff_score, while OKP/Solr uses the existing query-parameter threshold path.

tests/unit/utils/test_vector_search.py (5)

3-35: LGTM: test fixture now mirrors the production cutoff default.

Missing mocked relevance_cutoff_score values fall back to constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE, avoiding accidental test-only filtering disablement.


111-116: LGTM: existing extraction tests preserve prior behavior explicitly.

Passing byok_raw_cutoff_score=0.0 keeps these tests focused on metadata and weighting behavior.

Also applies to: 136-141


147-171: LGTM: cutoff filtering is covered at extraction level.

The test verifies that low raw scores are excluded before the surviving score is multiplied.


427-453: LGTM: BYOK fetch fixtures include cutoff mappings.

The updated mocks now provide the new relevance_cutoff_mapping dependency alongside score and RAG ID mappings.

Also applies to: 485-495, 525-537, 576-598, 707-735


611-650: LGTM: fetch-level regression test covers configured cutoff behavior.

This verifies _fetch_byok_rag drops chunks below the per-store raw cutoff before returning context chunks.

Comment thread docs/openapi.json Outdated
Comment thread src/configuration.py Outdated
Comment thread src/models/config.py
Comment thread tests/integration/endpoints/test_query_byok_integration.py Outdated
Comment thread tests/unit/utils/test_responses.py Outdated
@syedriko syedriko force-pushed the syedriko-lcore-1853 branch from 0e12781 to e371ef4 Compare April 18, 2026 00:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/responses.py`:
- Around line 670-715: The code currently combines multiple vector stores into a
single SearchRankingOptions by using max(thresholds) in
file_search_ranking_options_for_vector_stores, which enforces the highest cutoff
across all stores; change the logic to produce one InputToolFileSearch per
vector_store_id with its own SearchRankingOptions.score_threshold (use the
store-specific cutoff: for SOLR use SOLR_VECTOR_SEARCH_DEFAULT_SCORE_THRESHOLD,
for BYOK use br.relevance_cutoff_score, otherwise
DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) instead of a global max; also when
translating explicit client-provided file_search tools (the code path that
checks file_tool.ranking_options and calls
file_tool.model_copy(update=updates)), enforce the store-specific cutoff as a
floor (i.e., set score_threshold = max(client_threshold, store_cutoff) per
vector_store_id) and split/emit separate tool instances for each resolved
vector_store_id so each tool call uses the correct per-store ranking_options.
- Line 86: The import is using a submodule path for SearchRankingOptions which
is unsupported; update the import to use the flat root-package import by
replacing the submodule import of SearchRankingOptions in src/utils/responses.py
with a root-level import from llama_stack_api so the module imports
SearchRankingOptions directly from the package root (update the import statement
referencing SearchRankingOptions accordingly).

In `@src/utils/vector_search.py`:
- Around line 399-419: The relevance cutoff fallback is unreachable and should
be removed: in the loop that calls _query_store_for_byok_rag, replace
relevance_cutoff_mapping.get(vector_store_id,
constants.DEFAULT_BYOK_RAG_RELEVANCE_CUTOFF_SCORE) with a direct lookup
relevance_cutoff_mapping[vector_store_id] (since vector_store_ids_to_query is
derived from configuration.byok_rag via resolve_vector_store_ids); this makes
intent explicit and allows misconfigurations to surface as errors instead of
silently using the default.

In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 968-991: The test's vector_io.query mock currently always returns
a low-score chunk (0.4) which ignores the production contract of honoring
score_threshold; change the mock so mock_client.vector_io.query is an AsyncMock
with a side_effect that reads the incoming score_threshold kwarg/arg and returns
the low_score_resp only when score_threshold <= 0.4, otherwise returns an
empty/_make_vector_io_response(..., []) result; use the existing
_make_vector_io_response helper to construct both responses and keep references
to mock_client.vector_io.query, _make_vector_io_response, and QueryRequest to
locate and implement the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 16fb90e9-2655-4a90-8099-91f1b6d7249d

📥 Commits

Reviewing files that changed from the base of the PR and between 0e12781 and c535d8e.

📒 Files selected for processing (18)
  • docs/byok_guide.md
  • docs/openapi.json
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/app/endpoints/a2a.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (6)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/utils/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/utils/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/utils/responses.py
  • src/models/config.py
  • src/utils/vector_search.py
  • src/app/endpoints/a2a.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/config.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/a2a.py
🧠 Learnings (17)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • src/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/config.py : Use type hints: `Optional[FilePath]`, `PositiveInt`, `SecretStr` for configuration fields

Applied to files:

  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • docs/byok_guide.md
  • src/utils/vector_search.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Include `Parameters:`, `Returns:`, `Raises:` sections in function docstrings as needed

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : All functions require docstrings with brief descriptions

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/utils/responses.py
  • tests/unit/utils/test_responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports

Applied to files:

  • src/utils/responses.py
  • tests/unit/utils/test_responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `typing_extensions.Self` for model validators in type annotations

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/a2a.py
🪛 GitHub Actions: Integration tests
tests/integration/endpoints/test_query_byok_integration.py

[error] 991-991: Test failed: cutoff should be applied before score_multiplier. AssertionError: expected response.rag_chunks to be empty, but got [RAGChunk(content='Content below raw cutoff', source='cutoff-test', score=4.0, attributes={'document_id': 'doc-below'})].


[warning] 1-1: ResourceWarning: unclosed database in sqlite3.Connection object (reported from logging/init.py).

🔇 Additional comments (18)
tests/unit/models/config/test_dump_configuration.py (1)

11-11: LGTM: serialization now checks the centralized BYOK cutoff default.

Using the shared constant keeps the dump expectation aligned with the config model default. Based on learnings, Central constants.py for shared constants with descriptive comments.

Also applies to: 991-991

examples/lightspeed-stack-byok-okp-rag.yaml (1)

43-50: LGTM: the BYOK example now shows the new cutoff setting.

Both sample stores include the per-store cutoff, and the inline comment correctly documents that it is applied before score_multiplier.

tests/integration/endpoints/test_query_byok_integration.py (1)

253-267: LGTM: explicit zero cutoffs preserve existing test behavior.

These fixtures keep non-cutoff-focused tests exercising the pre-existing BYOK behavior while satisfying the new config shape.

Also applies to: 285-299, 422-432, 493-494, 691-695, 802-812, 875-885, 1016-1020, 1093-1094

tests/unit/test_llama_stack_configuration.py (1)

9-18: LGTM: BYOK raw-list normalization is covered.

The tests verify both accepted list input and safe fallback for non-list forms.

Also applies to: 28-45

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

235-248: LGTM: streaming BYOK mocks include the new cutoff field.

Using 0.0 keeps these existing streaming scenarios focused on their original behavior while matching the updated BYOK entry interface.

Also applies to: 262-275, 342-352, 408-418, 696-700, 756-784, 855-865, 944-948, 1025-1026

tests/unit/test_configuration.py (1)

1189-1225: LGTM: duplicate cutoff mapping behavior is covered.

This test locks in first-wins semantics for repeated vector_db_id values.

src/configuration.py (1)

511-531: LGTM: cutoff mapping preserves first-wins semantics.

The explicit loop avoids overwriting duplicate vector_db_id entries and matches the documented behavior.

src/constants.py (1)

189-196: LGTM: the cutoff default is centralized with related RAG constants.

The Solr threshold comment also clarifies scope without changing behavior.

src/utils/vector_search.py (1)

188-197: Score_threshold is properly supported across all BYOK vector_io providers.

The score_threshold parameter is natively supported by all three BYOK-compatible providers—inline::faiss, inline::sqlite-vec, and remote::pgvector—with consistent semantics across the unified VectorIORouter. Each provider treats it as a post-filter threshold on normalized similarity scores (higher = better match), ensuring that moving the cutoff server-side via params["score_threshold"] applies uniformly to every BYOK query without risk of silent bypass or provider-specific inconsistency.

docs/openapi.json (4)

6934-6936: LGTM — GET operation is now distinct.

The GET metadata clearly describes the /a2a GET handling, and the operationId is unique from POST in the shown spec.


6952-6954: LGTM — POST operationId is unique.

This addresses the client-generation risk from duplicate /a2a operation IDs.


8077-8083: LGTM — cutoff schema is clear and constrained.

The new field documents the raw-score cutoff semantics, non-negative bound, default, and BYOK-only scope.


8261-8261: LGTM — parent schema description matches the per-store model.

Calling out byok_rag: [ ... ] reduces ambiguity about where relevance_cutoff_score belongs.

tests/unit/utils/test_vector_search.py (2)

28-35: LGTM — fixture now mirrors the production cutoff default.

The helper keeps mocked BYOK entries aligned with the configured default instead of silently disabling filtering.


611-646: LGTM — cutoff forwarding is covered.

This test verifies that the configured BYOK cutoff is sent to vector_io.query as score_threshold, which matches the feature path.

src/models/config.py (2)

1618-1649: LGTM — cutoff validation is appropriately bounded.

The field enforces non-negative values and rejects inf/nan, preventing silent “drop everything” configurations.


1661-1682: LGTM — BYOK input normalization is explicit.

Handling null as an empty list while rejecting top-level mappings keeps the accepted YAML shape clear.

src/app/endpoints/a2a.py (1)

695-729: LGTM — route split keeps behavior shared while fixing operation IDs.

Both GET and POST preserve authorization and delegate into the common JSON-RPC handler.

Comment thread src/utils/responses.py Outdated
Comment thread src/utils/responses.py Outdated
Comment thread src/utils/vector_search.py
Comment thread tests/integration/endpoints/test_query_byok_integration.py
@syedriko syedriko force-pushed the syedriko-lcore-1853 branch from c535d8e to 579c17e Compare April 18, 2026 01:40
@syedriko syedriko marked this pull request as draft April 21, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants