Conversation
WalkthroughThe changes refactor URL construction logic in two configuration and utility functions to use Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llama_stack_configuration.py`:
- Around line 472-475: The current solr_url construction uses urljoin(base_url,
"/solr") which treats "/solr" as an absolute path and discards path segments in
base_url; update the solr_url logic to ensure base_url ends with a trailing
slash (when using rhokp_raw or constants.RH_SERVER_OKP_DEFAULT_URL) and call
urljoin with the relative segment "solr" instead of "/solr" (i.e., produce
solr_url using base_url if base_url.endswith("/") else base_url + "/" and
urljoin(..., "solr")); adjust the code where base_url is defined (variables:
rhokp_raw, base_url) and where solr_url is set (solr_url, urljoin) accordingly.
In `@src/utils/vector_search.py`:
- Line 10: _join_okp_doc_url currently allows
absolute/scheme-relative/absolute-path references which can escape the
configured OKP origin; fix it in _join_okp_doc_url by parsing the incoming
reference (using urllib.parse.urlparse), stripping any scheme and netloc,
removing leading slashes from the parsed.path, and then reconstructing a
relative reference from path (+ params/query/fragment if needed) before calling
urljoin with the configured base; this ensures references derived from
parent_id/doc_id/reference_url remain origin-relative and cannot override the
OKP host.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5cdcedac-adad-4447-95b6-2f2af373f693
📒 Files selected for processing (2)
src/llama_stack_configuration.pysrc/utils/vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Pylinter
- GitHub Check: integration_tests (3.12)
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: integration_tests (3.13)
- GitHub Check: Pyright
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: mypy
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/llama_stack_configuration.pysrc/utils/vector_search.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/llama_stack_configuration.pysrc/utils/vector_search.py
| base_url = ( | ||
| str(rhokp_raw) if rhokp_raw is not None else constants.RH_SERVER_OKP_DEFAULT_URL | ||
| ) | ||
| solr_url = urljoin(base_url, "/solr") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from urllib.parse import urljoin
cases = [
("${env.RH_SERVER_OKP}", "/solr"),
("${env.RH_SERVER_OKP}/", "solr"),
("https://okp.example.com/prefix", "/solr"),
("https://okp.example.com/prefix/", "solr"),
]
for base, ref in cases:
print(f"urljoin({base!r}, {ref!r}) -> {urljoin(base, ref)!r}")
PYRepository: lightspeed-core/lightspeed-stack
Length of output: 376
🏁 Script executed:
# Verify the file exists and check the specific lines
head -500 src/llama_stack_configuration.py | tail -50Repository: lightspeed-core/lightspeed-stack
Length of output: 2005
🏁 Script executed:
# Search for the specific code pattern and context
rg -n "urljoin.*solr" -A 2 -B 2 src/Repository: lightspeed-core/lightspeed-stack
Length of output: 455
🏁 Script executed:
# Check the constants file for RH_SERVER_OKP_DEFAULT_URL
rg -n "RH_SERVER_OKP_DEFAULT_URL" src/Repository: lightspeed-core/lightspeed-stack
Length of output: 375
🏁 Script executed:
# Get more context around lines 472-475
sed -n '460,485p' src/llama_stack_configuration.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 906
Use relative path segment to preserve base URL.
Line 475 uses an absolute path /solr in urljoin, which resets the URL to the root. For base URLs like "${env.RH_SERVER_OKP}" or "https://host/okp", this drops the base entirely or the path prefix. Ensure the base URL has a trailing slash and use "solr" as a relative segment:
solr_url = urljoin(base_url if base_url.endswith("/") else base_url + "/", "solr")This preserves the full base URL in all cases: "${env.RH_SERVER_OKP}/solr" and "https://host/okp/solr".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llama_stack_configuration.py` around lines 472 - 475, The current
solr_url construction uses urljoin(base_url, "/solr") which treats "/solr" as an
absolute path and discards path segments in base_url; update the solr_url logic
to ensure base_url ends with a trailing slash (when using rhokp_raw or
constants.RH_SERVER_OKP_DEFAULT_URL) and call urljoin with the relative segment
"solr" instead of "/solr" (i.e., produce solr_url using base_url if
base_url.endswith("/") else base_url + "/" and urljoin(..., "solr")); adjust the
code where base_url is defined (variables: rhokp_raw, base_url) and where
solr_url is set (solr_url, urljoin) accordingly.
There was a problem hiding this comment.
RHOKP's internal Solr service is always at /solr and can't be at a subpatch like https://host/okp/solr. If that ever changes, then we can update this code.
There was a problem hiding this comment.
@mwcz, thanks for the clarification! That makes sense — since the Solr service is always rooted at /solr on the RHOKP host, using an absolute path with urljoin is the correct behavior here. My concern doesn't apply. The current implementation is correct.
(∪・ω・∪)
✏️ Learnings added
Learnt from: mwcz
Repo: lightspeed-core/lightspeed-stack PR: 1561
File: src/llama_stack_configuration.py:472-475
Timestamp: 2026-04-21T15:17:16.516Z
Learning: In the lightspeed-stack repository (src/llama_stack_configuration.py), the RHOKP internal Solr service is always located at the absolute path `/solr` on the host (e.g., `https://host/solr`). It is never at a subpath like `/okp/solr`. Therefore, using `urljoin(base_url, "/solr")` is intentional and correct — the absolute `/solr` path should not be changed to a relative segment.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| import asyncio | ||
| import traceback | ||
| from typing import Any, Optional, cast | ||
| from urllib.parse import urljoin |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from urllib.parse import urljoin
base = "https://okp.example.com/prefix"
references = [
"/articles/123",
"articles/123",
"//evil.example/articles/123",
"https://evil.example/articles/123",
]
for reference in references:
print(f"urljoin({base!r}, {reference!r}) -> {urljoin(base, reference)!r}")
PYRepository: lightspeed-core/lightspeed-stack
Length of output: 506
🏁 Script executed:
# First, check if the file exists and examine the import and function
fd vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 141
🏁 Script executed:
# Check the file size first
wc -l src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 106
🏁 Script executed:
# Read the import section around line 10
head -20 src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 656
🏁 Script executed:
# Read the function around lines 568-580
sed -n '560,590p' src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 967
🏁 Script executed:
# Find all calls to _join_okp_doc_url
rg "_join_okp_doc_url" -B 3 -A 3Repository: lightspeed-core/lightspeed-stack
Length of output: 1787
🏁 Script executed:
# Find _get_okp_base_url and understand the base URL configuration
rg "_get_okp_base_url" -B 5 -A 10Repository: lightspeed-core/lightspeed-stack
Length of output: 7860
🏁 Script executed:
# Look at the _build_document_url function context more broadly
sed -n '582,620p' src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1321
🏁 Script executed:
# Check for tests of _join_okp_doc_url and related functionality
rg "join_okp_doc_url\|_build_document_url" tests/ -B 2 -A 10Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Look at the actual metadata sources to understand the data flow
rg "parent_id\|reference_url\|chunk.metadata" src/utils/vector_search.py -B 2 -A 2Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check the test file directly for tests related to _join_okp_doc_url
grep -n "join_okp_doc_url\|_build_document_url" tests/unit/utils/test_vector_search.py -B 3 -A 10Repository: lightspeed-core/lightspeed-stack
Length of output: 2750
🏁 Script executed:
# Look at parent_id and reference_url usage in the code
sed -n '595,650p' src/utils/vector_search.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 2140
Enforce origin-relative references in _join_okp_doc_url to prevent URL hijacking.
The function's docstring promises "origin-relative document path," but the implementation accepts absolute URLs (https://...), scheme-relative URLs (//...), and absolute paths (/...), all of which override the configured base. If reference parameter values (from parent_id, doc_id, or reference_url metadata) contain such URLs—whether through API poisoning or data corruption—the function silently generates URLs outside the intended OKP host.
Strip any scheme and netloc from the reference before joining against the base URL, and remove leading slashes to ensure the reference remains relative:
Proposed fix
-from urllib.parse import urljoin
+from urllib.parse import urljoin, urlsplit, urlunsplit def _join_okp_doc_url(base_url: AnyUrl, reference: Optional[str]) -> str:
"""Build a well-formed document URL from base and reference path.
@@
Returns:
Well-formed doc_url string, or empty string if reference is empty.
"""
- if not reference:
+ if reference is None:
return ""
- return urljoin(str(base_url), reference)
+
+ reference = reference.strip()
+ if not reference:
+ return ""
+
+ parsed_reference = urlsplit(reference)
+ if parsed_reference.scheme or parsed_reference.netloc:
+ reference = urlunsplit(
+ (
+ "",
+ "",
+ parsed_reference.path,
+ parsed_reference.query,
+ parsed_reference.fragment,
+ )
+ )
+
+ reference = reference.lstrip("/")
+ if not reference:
+ return ""
+
+ return urljoin(f"{str(base_url).rstrip('/')}/", reference)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/vector_search.py` at line 10, _join_okp_doc_url currently allows
absolute/scheme-relative/absolute-path references which can escape the
configured OKP origin; fix it in _join_okp_doc_url by parsing the incoming
reference (using urllib.parse.urlparse), stripping any scheme and netloc,
removing leading slashes from the parsed.path, and then reconstructing a
relative reference from path (+ params/query/fragment if needed) before calling
urljoin with the configured base; this ensures references derived from
parent_id/doc_id/reference_url remain origin-relative and cannot override the
OKP host.
This is an alternative to #1431
Summary
RH_SERVER_OKPURL and the Solr RAG URL.RH_SERVER_OKPis meant to be the base URL of the OKP instance, and reference links can be joined onto it (ex:$(RH_SERVER_OKP)/articles/1234), and the Solr URL can also be joined onto it (ex:$(RH_SERVER_OKP)/solr).urllib.parse.urljoinin bothllama_stack_configuration.py(solr URL fromrhokp_url) andutils/vector_search.py(OKP document URLs).urljoinhandles all of that correctly.Test plan
rhokp_urlconfiguv run make test-unitto confirm no regressions🤖 Generated with Claude Code
Summary by CodeRabbit