Skip to content

feat: Downstream llm-d#22

Open
albertoperdomo2 wants to merge 10 commits intoopenshift-psap:mainfrom
albertoperdomo2:feat/downstream-llm-d
Open

feat: Downstream llm-d#22
albertoperdomo2 wants to merge 10 commits intoopenshift-psap:mainfrom
albertoperdomo2:feat/downstream-llm-d

Conversation

@albertoperdomo2
Copy link
Copy Markdown
Collaborator

@albertoperdomo2 albertoperdomo2 commented Apr 13, 2026

Summary by CodeRabbit

  • New Features

    • Added comprehensive runtime orchestration for LLM D deployment on Kubernetes/OpenShift clusters
    • Implemented model caching for HuggingFace and OCI registry models with PVC-based storage
    • Added multiple scheduler profiles (approximate, precise) for inference optimization
    • Introduced new runtime presets (smoke, benchmark-short) for different workload configurations
    • Added preparation, testing, and cleanup toolbox workflows
    • Implemented smoke testing and GuideLLM benchmarking capabilities
  • Documentation

    • Updated project README with llm_d-specific orchestration scope

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR expands the llm_d project from a skeleton template into a fully-featured LLM inference orchestration system. It introduces a comprehensive runtime configuration module, Kubernetes cluster orchestration workflows (prepare/test/cleanup), model-cache management, multiple toolbox entrypoints, Kubernetes manifests for components, and modularizes configuration via config.d/ fragments with a merge-capable loader.

Changes

Cohort / File(s) Summary
README & Top-Level Configuration
projects/llm_d/README.md, projects/llm_d/orchestration/config.yaml
Replaced generic skeleton guide with llm_d-specific description; condensed main config to minimal project metadata and delegated settings to modular config.d/ files.
Modular Configuration Chunks
projects/llm_d/orchestration/config.d/model_cache.yaml, config.d/models.yaml, config.d/platform.yaml, config.d/runtime.yaml, config.d/scheduler_profiles.yaml, config.d/workloads.yaml
Six new YAML configuration files defining model definitions, caching behavior, platform prerequisites, runtime presets, scheduler profiles, and workload (smoke/benchmark) parameters.
Orchestration Runtime Engine
projects/llm_d/orchestration/llmd_runtime.py
New ~1240-line module implementing configuration resolution, Kubernetes/OpenShift helpers, operator installation, manifest rendering, model-cache orchestration, and benchmark/smoke-request job builders.
CLI & CI Entry Points
projects/llm_d/orchestration/ci.py, projects/llm_d/orchestration/cli.py
Updated command handlers with explicit -> int return type annotations; removed sys.exit() calls in favor of direct returns from delegated functions.
Orchestration Delegates
projects/llm_d/orchestration/prepare_llmd.py, projects/llm_d/orchestration/test_llmd.py
Converted stubs to thin wrappers delegating to toolbox entrypoints (prepare_toolbox_run(), test_toolbox_run(), cleanup_toolbox_run()).
Toolbox Implementations
projects/llm_d/toolbox/prepare/main.py, toolbox/test/main.py, toolbox/cleanup/main.py, toolbox/prepare_model_cache/main.py
Four new toolbox modules implementing full prepare (cluster bootstrap), test (inference service deployment and smoke test), cleanup (resource deletion), and model-cache orchestration workflows.
Kubernetes Manifests
projects/llm_d/orchestration/manifests/datasciencecluster.yaml, manifests/gateway.yaml, manifests/gpu-clusterpolicy.yaml, manifests/llminferenceservice.yaml, manifests/nfd-nodefeaturediscovery.yaml
Five new manifest templates for OpenDataHub DataScienceCluster, Kubernetes Gateway, NVIDIA GPU operator policy, KServe LLM inference service, and Node Feature Discovery resources.
Scheduler Profile Definitions
projects/llm_d/orchestration/scheduler_profiles/approximate.yaml, scheduler_profiles/precise.yaml, presets.d/scheduler_profiles.yaml
Three scheduler configuration files defining approximate (queue/KV-cache/prefix-cache scorer weights) and precise (tokenizer-aware prefix-cache scorer) endpoint-picker strategies.
Presets Configuration
projects/llm_d/orchestration/presets.d/presets.yaml, presets.d/cks.yaml
Updated presets: removed legacy pvc_rwx and llama-70b presets; added runtime-oriented smoke, smoke-precise, smoke-default-scheduler, and benchmark-short presets; deleted cks.yaml.
Core Library Updates
projects/core/library/config.py, projects/core/dsl/log.py, projects/core/dsl/runtime.py
Enhanced config loader to merge config.yaml with config.d/*.yaml chunk files; updated documentation references in dsl modules to reflect capture_llmisvc_state.
Tests & Toolbox Support
tests/llm_d/test_runtime.py, projects/llm_d/toolbox/capture_llmisvc_state/main.py
Comprehensive test suite for runtime config, manifest rendering, job orchestration, and scheduler selection; minor refactoring of capture toolbox for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Prepare as Prepare Entrypoint
    participant Runtime as llmd_runtime
    participant K8s as Kubernetes/OpenShift
    participant Operators as Operators & CRDs
    participant Components as Cluster Components
    
    User->>Prepare: invoke prepare
    Prepare->>Runtime: load_run_configuration()
    Runtime->>K8s: validate oc access & cluster version
    Prepare->>Operators: ensure cert-manager, leader-worker-set
    Operators->>K8s: apply subscriptions & CRDs
    K8s->>Operators: wait for CSV readiness
    Prepare->>Operators: ensure NFD operator with bootstrap
    Operators->>K8s: wait for GPU discovery labels
    Prepare->>Operators: ensure GPU operator, render ClusterPolicy
    K8s->>Components: apply GPU manifest
    Components->>K8s: reach Ready state
    Prepare->>Components: render & apply DataScienceCluster
    K8s->>Components: wait until Ready
    Prepare->>Components: ensure/create Gateway, wait Programmed
    Prepare->>Runtime: prepare model cache
    Prepare->>K8s: verify GPU nodes exist
    Prepare->>User: capture final state, return success
Loading
sequenceDiagram
    participant User
    participant Test as Test Entrypoint
    participant Runtime as llmd_runtime
    participant K8s as Kubernetes/OpenShift
    participant Registry as Model Registry/HF
    participant Service as InferenceService
    participant Benchmark as GuideLLM Benchmark
    
    User->>Test: invoke test
    Test->>Runtime: load_run_configuration()
    Test->>K8s: delete old InferenceService
    Test->>K8s: wait for old pods gone
    Test->>K8s: apply rendered llminferenceservice.yaml
    K8s->>Service: create pods
    Test->>Service: wait for Ready condition
    Service->>Registry: load model from cache/download
    Test->>K8s: resolve Gateway endpoint URL
    Test->>Test: build JSON payload, write smoke request
    Test->>K8s: apply smoke-request Job
    K8s->>Service: curl smoke endpoint
    Service->>K8s: return response
    Test->>K8s: capture smoke job logs
    alt benchmark enabled
        Test->>Benchmark: apply benchmark PVC & Job
        K8s->>Benchmark: run GuideLLM benchmark
        Benchmark->>Service: send requests, collect metrics
        Test->>K8s: apply copy helper pod
        K8s->>Benchmark: exec extract results
        Test->>K8s: delete benchmark resources
    end
    Test->>User: capture inference state, return success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hops through the config with joy and delight,
Merging the chunks from left and from right!
Prepare, test, cleanup—a symphony grand,
llm_d now orchestrates across the whole land!
Models cache swiftly, benchmarks take flight,
The rabbit's work shines: refactored, brand new, and right! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title 'feat: Downstream llm-d' accurately reflects the primary change: implementation of downstream llm-d orchestration with new runtime configuration, toolbox modules, and Kubernetes manifests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mml-coder for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
projects/llm_d/toolbox/capture_isvc_state/main.py (1)

285-293: ⚠️ Potential issue | 🔴 Critical

Replace undefined capture_isvc_state with run function on lines 285 and 292.

Lines 285 and 292 reference capture_isvc_state, which is not defined or imported in this module. The function should call run instead, which is the actual function defined in this module with the matching signature.

🔧 Proposed fix
-    parser = create_dynamic_parser(capture_isvc_state, positional_args=["llmisvc_name"])
+    parser = create_dynamic_parser(run, positional_args=["llmisvc_name"])
     args = parser.parse_args()
 
     # Convert args to kwargs for function call
     kwargs = vars(args)
     env.init(daily_artifact_dir=True)
     try:
-        capture_isvc_state(**kwargs)
+        run(**kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/capture_isvc_state/main.py` around lines 285 - 293,
The code calls an undefined symbol capture_isvc_state when building the CLI and
invoking the function; replace both references to capture_isvc_state with the
actual function name run (used by create_dynamic_parser and the invocation), so
that parser = create_dynamic_parser(run, positional_args=["llmisvc_name"]) and
the try block calls run(**kwargs) after env.init(daily_artifact_dir=True),
leaving args parsing and kwargs conversion unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/llm_d/manifests/gateway.yaml`:
- Around line 12-14: The Gateway's allowedRoutes.namespaces is set to "from:
All", which permits any namespace to bind routes to this listener; change the
setting in the Gateway manifest (allowedRoutes -> namespaces -> from) from "All"
to "Same" to enforce namespace isolation for this listener (unless you have a
documented, explicit need for cross-namespace routing, in which case add
justification and tighten other Gateway controls).

In `@config/llm_d/manifests/llminferenceservice.yaml`:
- Around line 78-96: The current extreme failureThresholds on livenessProbe and
readinessProbe (livenessProbe.failureThreshold and
readinessProbe.failureThreshold) will hide real outages—replace the long-tailing
behavior by adding a startupProbe for the long model cold-start and revert
livenessProbe and readinessProbe to responsive values (e.g., small
failureThreshold like 3 and shorter periodSeconds/timeouts) so restarts and
traffic removal happen quickly; specifically, introduce a startupProbe that
targets the same /health endpoint on port 8000 with long
initialDelaySeconds/periodSeconds suitable for model load, and update
livenessProbe and readinessProbe in the manifest to remove the huge
failureThresholds and long delays so they remain sensitive once the startupProbe
succeeds.

In `@config/llm_d/models.yaml`:
- Around line 16-25: The llama model entry "llama-3-1-8b-instruct-fp8" is
missing resources.limits.cpu and resources.limits.memory so llmd_runtime.py's
replacement of the entire resources block results in no CPU/memory limits being
applied; update the model stanza to include limits.cpu and limits.memory (e.g.,
set limits.cpu: "4" and limits.memory: 8Gi to match the requests) while keeping
the existing nvidia.com/gpu limits, so resources.requests and resources.limits
are fully specified for "llama-3-1-8b-instruct-fp8".

In `@projects/llm_d/orchestration/llmd_runtime.py`:
- Around line 326-341: The wait loop in wait_until is swallowing terminal
RuntimeError from predicates like wait_for_datasciencecluster_ready; change the
except block inside wait_until so that after catching Exception as exc it
immediately re-raises if isinstance(exc, RuntimeError) (or any other terminal
exception class you choose), otherwise treat it as a transient error (log and
continue as now). Update the except clause in wait_until (the try/except
surrounding predicate()) to raise exc when it's a RuntimeError, and only set
last_error/log and sleep for non-terminal exceptions.
- Around line 654-666: The init container "permission-fixer" currently requests
root via securityContext keys runAsUser: 0 and allowPrivilegeEscalation: True
which will be rejected by OpenShift's restricted SCC; update the pod spec
generation in llmd_runtime.py to avoid root: either (A) remove runAsUser: 0 and
allowPrivilegeEscalation and instead set a pod-level securityContext.fsGroup
(e.g., fsGroup: 1001) so the container can chmod/chown mounted PVCs without
running as root, ensure the init container runs as a non-root uid (e.g., 1001)
and keep the command unchanged, or (B) if root is unavoidable, set an explicit
serviceAccountName and document that the service account must be bound to an
appropriate SCC — reference the "permission-fixer" init container, the runAsUser
and allowPrivilegeEscalation fields, the pod securityContext.fsGroup option, and
serviceAccountName when making the patch.
- Around line 230-259: The run_command wrapper can hang indefinitely; add a
timeout parameter (e.g., timeout_seconds: float | None = 300) to run_command and
pass it into subprocess.run via the timeout argument, updating the function
signature and logging as needed; then update the oc(...) wrapper to accept the
same timeout_seconds parameter and forward it to run_command so callers can
override the default for long-running ops (reference functions: run_command and
oc).
- Around line 281-301: oc_get_json currently treats any non-zero oc exit as "not
found" when ignore_not_found=True; change oc_get_json (and callers like
resource_exists) to inspect result.stderr on failure and only return None when
stderr indicates a true NotFound (e.g., contains "Error from server (NotFound)"
or messages like "No resources found" / "not found"); for any other stderr
(RBAC/Forbidden, "no matches for kind", API errors, etc.) raise an exception (or
re-raise a CalledProcessError) so callers fail fast instead of assuming absence.
Ensure you reference and update oc_get_json's result handling logic and adjust
resource_exists to propagate the error rather than returning False for
non-NotFound oc errors.

In `@projects/llm_d/orchestration/prepare_llmd.py`:
- Around line 146-167: The NodeFeatureDiscovery bootstrap currently always
reapplies the manifest; change prepare_llmd.py to first check for the existing
NFD CR using llmd_runtime.resource_exists with the same name/namespace
(manifest["metadata"]["name"], manifest["metadata"]["namespace"]) and, if it
exists, skip llmd_runtime.apply_manifest and only call llmd_runtime.wait_until/
wait_for_nfd_gpu_labels; otherwise proceed to load_manifest_template,
apply_manifest and then wait_until and wait_for_nfd_gpu_labels as now. Ensure
the predicate and timeout usage remain identical to the existing wait_until
call.
- Around line 57-83: The deletes currently fire and return immediately; modify
the teardown after the llmd_runtime.oc delete calls to poll until the objects
are actually gone: repeatedly call llmd_runtime.oc("get", ...) for the
LLMInferenceService (inference_service_name), the Job and PVC (benchmark_name),
and the pod f"{benchmark_name}-copy" in config.namespace, sleeping briefly
between attempts and enforcing an overall timeout; stop polling when each get
indicates the resource is not found (or returns non-zero), and log/raise an
error if the timeout elapses to avoid proceeding while resources are still
terminating.

In `@projects/llm_d/orchestration/test_llmd.py`:
- Around line 318-333: The test currently ignores failures when running
llmd_runtime.oc to cat /results/benchmarks.json; update the block that calls
llmd_runtime.oc (the variable result from llmd_runtime.oc and the subsequent
conditional that calls llmd_runtime.write_text) to treat a non-zero returncode
or empty stdout as a test failure: if result.returncode != 0 or not
result.stdout raise an exception or use the test framework's fail/assert
mechanism with a clear message including result.stderr/output; only call
llmd_runtime.write_text to write the artifact when stdout is present and the
command succeeded.
- Around line 41-61: Only perform the benchmark resource deletes when a
benchmark was actually created: wrap the calls to llmd_runtime.oc that delete
"job,pvc" and the pod ("{benchmark_name}-copy") in a conditional that checks
config.benchmark (truthy) before computing benchmark_name and invoking
llmd_runtime.oc; reference the existing benchmark_name variable and the
llmd_runtime.oc calls, and do not run these deletes when config.benchmark is
None (optionally also skip if namespace_is_managed is False, if that flag
semantics require avoiding cross-namespace deletes).
- Around line 62-74: The test currently always captures namespace events; guard
this behavior by checking the config flag artifacts.capture_namespace_events
before invoking llmd_runtime.oc and writing the file: only run the existing
block that calls llmd_runtime.oc(...), checks events.returncode and writes to
artifacts_dir / "namespace.events.txt" when the configuration flag
artifacts.capture_namespace_events (from config/llm_d/platform.yaml) is true;
locate the block that defines events and uses llmd_runtime.write_text and wrap
it with a conditional that reads the runtime/config object’s
artifacts.capture_namespace_events flag.
- Around line 193-211: The oc exec call that runs curl (the llmd_runtime.oc
invocation producing result) must be bounded per attempt so a hung curl doesn't
stall retries: add a per-attempt timeout (either by passing a timeout argument
to llmd_runtime.oc if it supports one, or by adding curl's --max-time flag in
the command) and catch the timeout/TimeoutExpired outcome around the
llmd_runtime.oc call so the retry loop can move to the next attempt when the
timeout fires; update handling of result/timeouts in the surrounding retry loop
accordingly.

In `@projects/llm_d/README.md`:
- Around line 13-16: The README contains machine-local absolute links to files
(ci.py, prepare_llmd.py, test_llmd.py, llmd_runtime.py); replace each absolute
path with a repo-relative link (e.g., orchestration/ci.py or
./orchestration/ci.py) so the links work on GitHub and other environments,
updating the four entries that mention ci.py, prepare_llmd.py, test_llmd.py, and
llmd_runtime.py accordingly.

---

Outside diff comments:
In `@projects/llm_d/toolbox/capture_isvc_state/main.py`:
- Around line 285-293: The code calls an undefined symbol capture_isvc_state
when building the CLI and invoking the function; replace both references to
capture_isvc_state with the actual function name run (used by
create_dynamic_parser and the invocation), so that parser =
create_dynamic_parser(run, positional_args=["llmisvc_name"]) and the try block
calls run(**kwargs) after env.init(daily_artifact_dir=True), leaving args
parsing and kwargs conversion unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a6025d00-1aab-4bc5-9011-d6cda857033c

📥 Commits

Reviewing files that changed from the base of the PR and between 24d4584 and 7d85dc5.

📒 Files selected for processing (18)
  • config/llm_d/manifests/datasciencecluster.yaml
  • config/llm_d/manifests/epp-approximate-prefix-cache.yaml
  • config/llm_d/manifests/gateway.yaml
  • config/llm_d/manifests/gpu-clusterpolicy.yaml
  • config/llm_d/manifests/llminferenceservice.yaml
  • config/llm_d/manifests/nfd-nodefeaturediscovery.yaml
  • config/llm_d/models.yaml
  • config/llm_d/platform.yaml
  • config/llm_d/presets.yaml
  • config/llm_d/workloads.yaml
  • projects/llm_d/README.md
  • projects/llm_d/orchestration/ci.py
  • projects/llm_d/orchestration/cli.py
  • projects/llm_d/orchestration/llmd_runtime.py
  • projects/llm_d/orchestration/prepare_llmd.py
  • projects/llm_d/orchestration/test_llmd.py
  • projects/llm_d/toolbox/capture_isvc_state/main.py
  • tests/llm_d/test_runtime.py

Comment on lines +12 to +14
allowedRoutes:
namespaces:
from: All
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -path "*/config/llm_d/manifests/gateway.yaml" -type f

Repository: openshift-psap/forge

Length of output: 101


🏁 Script executed:

cat -n config/llm_d/manifests/gateway.yaml 2>/dev/null || find . -name "gateway.yaml" -type f | head -20

Repository: openshift-psap/forge

Length of output: 468


🌐 Web query:

Kubernetes Gateway API allowedRoutes namespaces from All vs Same security

💡 Result:

In Kubernetes Gateway API, the allowedRoutes.namespaces.from field in a Gateway listener controls which namespaces' Routes can attach to that listener. The key options are "All" and "Same", with significant security differences. From: Same (default) - Only allows Routes from the same namespace as the Gateway to attach. - Provides strong namespace isolation: prevents cross-namespace attachment without explicit configuration. - Best for single-team or high-security environments where you want to avoid unintended exposure. - Example: Gateway in "infra-ns" only accepts Routes from "infra-ns". From: All - Allows Routes from all namespaces in the cluster to attach. - Highly permissive: any Route anywhere can bind, increasing risk of conflicts or unauthorized exposure. - Use only in fully trusted clusters; avoid in multi-tenant setups. - Example: Gateway accepts Routes from any namespace, e.g., a shared infra Gateway. Other options: - Selector: Allows Routes from namespaces matching a label selector (balanced security, recommended for multi-team clusters). - None: Blocks all Route attachments (for Gateway-only setups). Security implications: - "Same" enforces least privilege by default, reducing blast radius. - "All" requires additional safeguards like RBAC, NetworkPolicies, or admission controllers to prevent abuse. - Official docs recommend explicit constraints as best practice, especially for shared Gateways. - Cross-namespace Route->Gateway uses this handshake; separate from ReferenceGrant (for Route->Service/Secret). Default behavior: If allowedRoutes is unspecified, namespaces.from defaults to "Same". Examples: Same (secure): listeners: - name: http allowedRoutes: namespaces: from: Same All (permissive): listeners: - name: http allowedRoutes: namespaces: from: All Selector (recommended for shared): listeners: - name: http allowedRoutes: namespaces: from: Selector selector: matchLabels: gateway-access: "true"

Citations:


Restrict route attachment scope on this Gateway.

from: All allows any namespace in the cluster to bind routes to this listener. Since the Gateway is shared in openshift-ingress, this creates a cross-namespace exposure risk. Use from: Same (the namespace-isolation default) unless multi-namespace access is explicitly required.

🔒 Proposed safer default
       allowedRoutes:
         namespaces:
-          from: All
+          from: Same
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
allowedRoutes:
namespaces:
from: All
allowedRoutes:
namespaces:
from: Same
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llm_d/manifests/gateway.yaml` around lines 12 - 14, The Gateway's
allowedRoutes.namespaces is set to "from: All", which permits any namespace to
bind routes to this listener; change the setting in the Gateway manifest
(allowedRoutes -> namespaces -> from) from "All" to "Same" to enforce namespace
isolation for this listener (unless you have a documented, explicit need for
cross-namespace routing, in which case add justification and tighten other
Gateway controls).

Comment on lines +78 to +96
livenessProbe:
httpGet:
path: /health
port: 8000
scheme: HTTPS
initialDelaySeconds: 900
periodSeconds: 60
timeoutSeconds: 60
failureThreshold: 1000
readinessProbe:
failureThreshold: 10000
httpGet:
path: /health
port: 8000
scheme: HTTPS
initialDelaySeconds: 60
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These probe thresholds will mask real outages.

readinessProbe.failureThreshold: 10000 with a 30s period means the pod can keep receiving traffic for ~83 hours after /health starts failing. livenessProbe.failureThreshold: 1000 with a 60s period defers restarts for ~17 hours. If the goal is to tolerate long model startup, use a startupProbe and keep readiness/liveness responsive.

Suggested direction
       - name: main
+        startupProbe:
+          httpGet:
+            path: /health
+            port: 8000
+            scheme: HTTPS
+          periodSeconds: 30
+          timeoutSeconds: 30
+          failureThreshold: 60
         livenessProbe:
           httpGet:
             path: /health
             port: 8000
             scheme: HTTPS
-          initialDelaySeconds: 900
-          periodSeconds: 60
-          timeoutSeconds: 60
-          failureThreshold: 1000
+          periodSeconds: 30
+          timeoutSeconds: 10
+          failureThreshold: 3
         readinessProbe:
-          failureThreshold: 10000
           httpGet:
             path: /health
             port: 8000
             scheme: HTTPS
-          initialDelaySeconds: 60
-          periodSeconds: 30
+          periodSeconds: 10
           successThreshold: 1
-          timeoutSeconds: 30
+          timeoutSeconds: 10
+          failureThreshold: 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llm_d/manifests/llminferenceservice.yaml` around lines 78 - 96, The
current extreme failureThresholds on livenessProbe and readinessProbe
(livenessProbe.failureThreshold and readinessProbe.failureThreshold) will hide
real outages—replace the long-tailing behavior by adding a startupProbe for the
long model cold-start and revert livenessProbe and readinessProbe to responsive
values (e.g., small failureThreshold like 3 and shorter periodSeconds/timeouts)
so restarts and traffic removal happen quickly; specifically, introduce a
startupProbe that targets the same /health endpoint on port 8000 with long
initialDelaySeconds/periodSeconds suitable for model load, and update
livenessProbe and readinessProbe in the manifest to remove the huge
failureThresholds and long delays so they remain sensitive once the startupProbe
succeeds.

Comment thread config/llm_d/models.yaml Outdated
Comment on lines +16 to +25
llama-3-1-8b-instruct-fp8:
served_model_name: llama-3-1-8b-instruct-fp8
uri: oci://registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5
resources:
requests:
cpu: "4"
memory: 8Gi
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add CPU and memory limits for the llama model.

projects/llm_d/orchestration/llmd_runtime.py replaces the whole resources block when rendering the inference service, so omitting limits.cpu and limits.memory here does not inherit the template defaults. The llama deployment will run without CPU/memory limits while the qwen deployment is constrained.

Suggested fix
   llama-3-1-8b-instruct-fp8:
     served_model_name: llama-3-1-8b-instruct-fp8
     uri: oci://registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5
     resources:
       requests:
         cpu: "4"
         memory: 8Gi
         nvidia.com/gpu: "1"
       limits:
+        cpu: "4"
+        memory: 8Gi
         nvidia.com/gpu: "1"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
llama-3-1-8b-instruct-fp8:
served_model_name: llama-3-1-8b-instruct-fp8
uri: oci://registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5
resources:
requests:
cpu: "4"
memory: 8Gi
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
llama-3-1-8b-instruct-fp8:
served_model_name: llama-3-1-8b-instruct-fp8
uri: oci://registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5
resources:
requests:
cpu: "4"
memory: 8Gi
nvidia.com/gpu: "1"
limits:
cpu: "4"
memory: 8Gi
nvidia.com/gpu: "1"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llm_d/models.yaml` around lines 16 - 25, The llama model entry
"llama-3-1-8b-instruct-fp8" is missing resources.limits.cpu and
resources.limits.memory so llmd_runtime.py's replacement of the entire resources
block results in no CPU/memory limits being applied; update the model stanza to
include limits.cpu and limits.memory (e.g., set limits.cpu: "4" and
limits.memory: 8Gi to match the requests) while keeping the existing
nvidia.com/gpu limits, so resources.requests and resources.limits are fully
specified for "llama-3-1-8b-instruct-fp8".

Comment thread projects/llm_d/orchestration/llmd_runtime.py
Comment thread projects/llm_d/orchestration/llmd_runtime.py
Comment on lines +41 to +61
benchmark_name = (
config.benchmark["job_name"] if config.benchmark else "guidellm-benchmark"
)
llmd_runtime.oc(
"delete",
"job,pvc",
benchmark_name,
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"pod",
f"{benchmark_name}-copy",
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only clean up benchmark resources when a benchmark was actually created.

When config.benchmark is None, this still deletes job,pvc guidellm-benchmark and pod guidellm-benchmark-copy in the target namespace. That is risky for smoke-only runs, especially when namespace_is_managed is False and the namespace can contain unrelated resources.

Suggested fix
-        benchmark_name = (
-            config.benchmark["job_name"] if config.benchmark else "guidellm-benchmark"
-        )
-        llmd_runtime.oc(
-            "delete",
-            "job,pvc",
-            benchmark_name,
-            "-n",
-            namespace,
-            "--ignore-not-found=true",
-            check=False,
-        )
-        llmd_runtime.oc(
-            "delete",
-            "pod",
-            f"{benchmark_name}-copy",
-            "-n",
-            namespace,
-            "--ignore-not-found=true",
-            check=False,
-        )
+        if config.benchmark:
+            benchmark_name = config.benchmark["job_name"]
+            llmd_runtime.oc(
+                "delete",
+                "job,pvc",
+                benchmark_name,
+                "-n",
+                namespace,
+                "--ignore-not-found=true",
+                check=False,
+            )
+            llmd_runtime.oc(
+                "delete",
+                "pod",
+                f"{benchmark_name}-copy",
+                "-n",
+                namespace,
+                "--ignore-not-found=true",
+                check=False,
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
benchmark_name = (
config.benchmark["job_name"] if config.benchmark else "guidellm-benchmark"
)
llmd_runtime.oc(
"delete",
"job,pvc",
benchmark_name,
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"pod",
f"{benchmark_name}-copy",
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
if config.benchmark:
benchmark_name = config.benchmark["job_name"]
llmd_runtime.oc(
"delete",
"job,pvc",
benchmark_name,
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"pod",
f"{benchmark_name}-copy",
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/test_llmd.py` around lines 41 - 61, Only perform
the benchmark resource deletes when a benchmark was actually created: wrap the
calls to llmd_runtime.oc that delete "job,pvc" and the pod
("{benchmark_name}-copy") in a conditional that checks config.benchmark (truthy)
before computing benchmark_name and invoking llmd_runtime.oc; reference the
existing benchmark_name variable and the llmd_runtime.oc calls, and do not run
these deletes when config.benchmark is None (optionally also skip if
namespace_is_managed is False, if that flag semantics require avoiding
cross-namespace deletes).

Comment on lines +62 to +74
events = llmd_runtime.oc(
"get",
"events",
"-n",
namespace,
"--sort-by=.metadata.creationTimestamp",
check=False,
capture_output=True,
)
if events.returncode == 0 and events.stdout:
llmd_runtime.write_text(
artifacts_dir / "namespace.events.txt", events.stdout
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Honor artifacts.capture_namespace_events.

config/llm_d/platform.yaml exposes this as a knob, but this block always captures and writes namespace events. That makes the config ineffective and forces extra artifact collection even when disabled.

Suggested fix
-        events = llmd_runtime.oc(
-            "get",
-            "events",
-            "-n",
-            namespace,
-            "--sort-by=.metadata.creationTimestamp",
-            check=False,
-            capture_output=True,
-        )
-        if events.returncode == 0 and events.stdout:
-            llmd_runtime.write_text(
-                artifacts_dir / "namespace.events.txt", events.stdout
-            )
+        if config.platform["artifacts"].get("capture_namespace_events", False):
+            events = llmd_runtime.oc(
+                "get",
+                "events",
+                "-n",
+                namespace,
+                "--sort-by=.metadata.creationTimestamp",
+                check=False,
+                capture_output=True,
+            )
+            if events.returncode == 0 and events.stdout:
+                llmd_runtime.write_text(
+                    artifacts_dir / "namespace.events.txt", events.stdout
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
events = llmd_runtime.oc(
"get",
"events",
"-n",
namespace,
"--sort-by=.metadata.creationTimestamp",
check=False,
capture_output=True,
)
if events.returncode == 0 and events.stdout:
llmd_runtime.write_text(
artifacts_dir / "namespace.events.txt", events.stdout
)
if config.platform["artifacts"].get("capture_namespace_events", False):
events = llmd_runtime.oc(
"get",
"events",
"-n",
namespace,
"--sort-by=.metadata.creationTimestamp",
check=False,
capture_output=True,
)
if events.returncode == 0 and events.stdout:
llmd_runtime.write_text(
artifacts_dir / "namespace.events.txt", events.stdout
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/test_llmd.py` around lines 62 - 74, The test
currently always captures namespace events; guard this behavior by checking the
config flag artifacts.capture_namespace_events before invoking llmd_runtime.oc
and writing the file: only run the existing block that calls
llmd_runtime.oc(...), checks events.returncode and writes to artifacts_dir /
"namespace.events.txt" when the configuration flag
artifacts.capture_namespace_events (from config/llm_d/platform.yaml) is true;
locate the block that defines events and uses llmd_runtime.write_text and wrap
it with a conditional that reads the runtime/config object’s
artifacts.capture_namespace_events flag.

Comment on lines +193 to +211
result = llmd_runtime.oc(
"exec",
"-n",
namespace,
f"deployment/{deployment_name}",
"-c",
"main",
"--",
"curl",
"-k",
"-sSf",
f"{endpoint_url}{config.platform['smoke']['endpoint_path']}",
"-H",
"Content-Type: application/json",
"-d",
json.dumps(payload),
check=False,
capture_output=True,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bound each smoke-request attempt with a timeout.

The retry loop does not help if curl blocks inside oc exec; one hung request can stall the whole CI run beyond the configured retry budget. Add per-attempt timeouts so failures actually progress to the next retry.

Suggested fix
         result = llmd_runtime.oc(
             "exec",
             "-n",
             namespace,
             f"deployment/{deployment_name}",
             "-c",
             "main",
             "--",
             "curl",
             "-k",
             "-sSf",
+            "--connect-timeout",
+            "10",
+            "--max-time",
+            str(delay),
             f"{endpoint_url}{config.platform['smoke']['endpoint_path']}",
             "-H",
             "Content-Type: application/json",
             "-d",
             json.dumps(payload),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/test_llmd.py` around lines 193 - 211, The oc
exec call that runs curl (the llmd_runtime.oc invocation producing result) must
be bounded per attempt so a hung curl doesn't stall retries: add a per-attempt
timeout (either by passing a timeout argument to llmd_runtime.oc if it supports
one, or by adding curl's --max-time flag in the command) and catch the
timeout/TimeoutExpired outcome around the llmd_runtime.oc call so the retry loop
can move to the next attempt when the timeout fires; update handling of
result/timeouts in the surrounding retry loop accordingly.

Comment on lines +318 to +333
result = llmd_runtime.oc(
"exec",
"-n",
namespace,
f"{benchmark_name}-copy",
"--",
"cat",
"/results/benchmarks.json",
check=False,
capture_output=True,
)
if result.returncode == 0 and result.stdout:
llmd_runtime.write_text(
config.artifact_dir / "artifacts" / "results" / "benchmarks.json",
result.stdout,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Treat missing benchmark output as a failure.

If oc exec ... cat /results/benchmarks.json fails or returns empty output, the run still succeeds and just skips writing the artifact. That can make benchmark jobs look green while producing no usable result.

Suggested fix
     result = llmd_runtime.oc(
         "exec",
         "-n",
         namespace,
         f"{benchmark_name}-copy",
         "--",
         "cat",
         "/results/benchmarks.json",
         check=False,
         capture_output=True,
     )
-    if result.returncode == 0 and result.stdout:
-        llmd_runtime.write_text(
-            config.artifact_dir / "artifacts" / "results" / "benchmarks.json",
-            result.stdout,
-        )
+    if result.returncode != 0 or not result.stdout:
+        raise RuntimeError(
+            f"Failed to copy GuideLLM results for {benchmark_name} from /results/benchmarks.json"
+        )
+    llmd_runtime.write_text(
+        config.artifact_dir / "artifacts" / "results" / "benchmarks.json",
+        result.stdout,
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result = llmd_runtime.oc(
"exec",
"-n",
namespace,
f"{benchmark_name}-copy",
"--",
"cat",
"/results/benchmarks.json",
check=False,
capture_output=True,
)
if result.returncode == 0 and result.stdout:
llmd_runtime.write_text(
config.artifact_dir / "artifacts" / "results" / "benchmarks.json",
result.stdout,
)
result = llmd_runtime.oc(
"exec",
"-n",
namespace,
f"{benchmark_name}-copy",
"--",
"cat",
"/results/benchmarks.json",
check=False,
capture_output=True,
)
if result.returncode != 0 or not result.stdout:
raise RuntimeError(
f"Failed to copy GuideLLM results for {benchmark_name} from /results/benchmarks.json"
)
llmd_runtime.write_text(
config.artifact_dir / "artifacts" / "results" / "benchmarks.json",
result.stdout,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/test_llmd.py` around lines 318 - 333, The test
currently ignores failures when running llmd_runtime.oc to cat
/results/benchmarks.json; update the block that calls llmd_runtime.oc (the
variable result from llmd_runtime.oc and the subsequent conditional that calls
llmd_runtime.write_text) to treat a non-zero returncode or empty stdout as a
test failure: if result.returncode != 0 or not result.stdout raise an exception
or use the test framework's fail/assert mechanism with a clear message including
result.stderr/output; only call llmd_runtime.write_text to write the artifact
when stdout is present and the command succeeded.

Comment thread projects/llm_d/README.md Outdated
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2026
Comment thread config/llm_d/models.yaml Outdated
Comment thread projects/llm_d/orchestration/ci.py Outdated
Comment thread projects/llm_d/orchestration/prepare_llmd.py Outdated
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2026
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
projects/llm_d/orchestration/manifests/llminferenceservice.yaml (1)

78-96: ⚠️ Potential issue | 🟠 Major

Keep readiness/liveness responsive; use startupProbe for cold start.

This is still masking real outages: readiness can remain passing for ~83 hours and liveness can defer restarts for ~17 hours. Move the long model-load tolerance to startupProbe, then keep readiness/liveness thresholds small.

Read-only verification
#!/bin/bash
# Verifies the current probe thresholds in the LLMInferenceService manifest.
python - <<'PY'
from pathlib import Path

for path in Path(".").rglob("llminferenceservice.yaml"):
    text = path.read_text()
    if "failureThreshold: 10000" in text or "failureThreshold: 1000" in text:
        print(f"{path}: contains extreme probe failureThreshold values")
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/manifests/llminferenceservice.yaml` around lines
78 - 96, The manifest currently buries long tolerances in livenessProbe and
readinessProbe (failureThreshold: 1000 / 10000, initialDelaySeconds: 900) which
masks outages; move the long model-load tolerance into a startupProbe and give
livenessProbe/readinessProbe conservative values so failures are detected
quickly: create a startupProbe with the long
initialDelaySeconds/periodSeconds/failureThreshold to cover cold starts (use the
existing path/port/scheme), then reduce readinessProbe and livenessProbe
failureThreshold to small values (e.g., 3), set smaller
initialDelaySeconds/periodSeconds/timeoutSeconds for responsiveness, and keep
the same httpGet path/port/scheme fields (livenessProbe, readinessProbe,
startupProbe, failureThreshold, initialDelaySeconds, periodSeconds,
timeoutSeconds).
🧹 Nitpick comments (4)
projects/llm_d/toolbox/prepare_model_cache/main.py (3)

42-45: Redundant capture_model_cache_state call on the happy path.

On successful download capture_model_cache_state runs twice: once in the finally at line 122, then again at line 44 after annotate_model_cache_pvc. This double-writes spec.json, the PVC/Job YAML, and pod logs. Suggest removing the post-success call in run_prepare_model_cache (the finally already handles both success and failure paths), or making the capture idempotent/incremental if the post-annotate state is meaningfully different.

Also applies to: 114-122

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/prepare_model_cache/main.py` around lines 42 - 45, The
code currently calls capture_model_cache_state twice on the success path: once
after llmd_runtime.annotate_model_cache_pvc() in run_prepare_model_cache and
again in the finally block that already handles both success and failure; remove
the redundant capture_model_cache_state(config, cache_spec) call that
immediately follows llmd_runtime.annotate_model_cache_pvc(cache_spec) in
run_prepare_model_cache so the single capture in the finally block is the only
writer (alternatively, if the post-annotate state must differ, make
capture_model_cache_state idempotent or add a flag to write only delta updates).

139-151: Use check=False for PVC diagnostic capture too.

capture_model_cache_state is called from a finally and is diagnostic in nature. The PVC capture at lines 139-144 defaults to check=True, so if the PVC was never created (e.g., apply_manifest failed before ensure_model_cache_pvc completed) or was deleted externally, the capture itself will raise and shadow the original download error. The Job capture already uses check=False; recommend the same here.

♻️ Proposed change
     capture_resource_yaml(
         "persistentvolumeclaim",
         cache_spec.pvc_name,
         cache_spec.namespace,
         artifact_dir / "pvc.yaml",
+        check=False,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/prepare_model_cache/main.py` around lines 139 - 151,
The PVC diagnostic capture in capture_model_cache_state currently calls
capture_resource_yaml("persistentvolumeclaim", cache_spec.pvc_name,
cache_spec.namespace, artifact_dir / "pvc.yaml") with the default check=True,
which can raise and hide the original error; change this call to pass
check=False (matching the Job capture) so capture_resource_yaml(...,
artifact_dir / "pvc.yaml", check=False) to safely ignore missing PVCs (refer to
capture_resource_yaml, capture_model_cache_state, cache_spec.pvc_name,
cache_spec.namespace, and ensure_model_cache_pvc).

100-107: Hardcoded 120s/5s deletion-wait contradicts the configured wait_timeout_seconds/poll_interval_seconds.

Everywhere else in this file you honor config.model_cache["download"]["wait_timeout_seconds"] and poll_interval_seconds, but the job-deletion wait here uses magic numbers. If a namespace is slow to reclaim finalizers, 120s will fire before the subsequent apply_manifest and produce a confusing AlreadyExists on the next Job. Prefer driving these from the same config keys (or a dedicated delete_timeout_seconds) for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/prepare_model_cache/main.py` around lines 100 - 107,
The wait for job deletion uses hardcoded timeout_seconds=120 and
interval_seconds=5 which contradicts the configured download timeouts; update
the llmd_runtime.wait_until call (the one referencing
cache_spec.download_job_name and cache_spec.namespace) to use the configured
values from config.model_cache["download"] (use its "wait_timeout_seconds" for
timeout_seconds and the poll/poll_interval_seconds key used elsewhere—e.g.
"poll_interval_seconds"—for interval_seconds), and ensure sensible fallbacks if
those config keys are missing to preserve prior behavior.
projects/llm_d/orchestration/cli.py (1)

47-52: post_cleanup duplicates pre_cleanup verbatim — confirm intent.

Both subcommands call prepare_llmd.cleanup() with the same docstring. If there truly is no distinction between pre- and post-cleanup behavior today, consider either factoring them to a shared helper with distinct docstrings, or documenting in a comment why the two phases are currently identical. If there is a semantic difference (e.g., post-cleanup should additionally tear down namespace/operators), that divergence is missing here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/cli.py` around lines 47 - 52, post_cleanup
currently duplicates pre_cleanup by calling prepare_llmd.cleanup() and sharing
the same docstring; either make their intent explicit or implement the missing
divergence. Either refactor both commands to call a shared helper (e.g.,
_run_cleanup()) and give pre_cleanup and post_cleanup distinct docstrings, or
modify post_cleanup to perform the additional post-phase actions (for example
calling the appropriate teardown/namespace/operator cleanup method on
prepare_llmd instead of cleanup()) and update its docstring and/or add a comment
explaining why behaviors differ; ensure you update or add tests/comments to
reflect the chosen approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/core/library/config.py`:
- Around line 377-387: When loading chunks in the loop (the yaml.safe_load
result assigned to chunk_value for each chunk_file from
config_dir_src.glob("*.yaml")), validate that chunk_value is a dict before
merging or assigning: if chunk_value is None or not isinstance(chunk_value,
dict) then skip the file (optionally log or warn) so you don't overwrite
existing config entries with None or scalars; otherwise proceed to merge via
_merge_config_values(config[key], chunk_value) or set config[key] = chunk_value
when missing. Ensure you reference chunk_file.stem when deciding the key and use
yaml.safe_load result type-checking to guard merges.

In `@projects/llm_d/orchestration/presets.d/presets.yaml`:
- Around line 17-19: The "cks" preset currently only extends "smoke" so
runtime.selected_preset remains "smoke"; update the "cks" preset (the preset
named "cks") to explicitly set runtime.selected_preset to "cks" (i.e., add a
runtime.selected_preset: "cks" override in the cks preset) so that after preset
application the selected_preset correctly reflects "cks" rather than inheriting
"smoke".

In `@projects/llm_d/toolbox/cleanup/main.py`:
- Around line 71-106: The pod deletion uses the selector
"forge.openshift.io/project=llm_d" but the wait predicate calls _llm_d_pods_gone
which checks pods by "app.kubernetes.io/name={inference_service_name}", causing
a selector mismatch; update the wait to use the same selector as the delete (or
change _llm_d_pods_gone to accept/use a selector parameter) so the
llmd_runtime.wait_until predicate checks for pods with
"forge.openshift.io/project=llm_d" (and respect the preserve label if needed)
rather than "app.kubernetes.io/name={inference_service_name}", referencing
llmd_runtime.oc, llmd_runtime.wait_until, _llm_d_pods_gone,
inference_service_name and namespace to locate the change.
- Line 1: Run the ruff formatter on the script containing the shebang line
"#!/usr/bin/env python3" to satisfy CI: execute ruff format on that file
locally, review the formatting changes, stage and commit the updated file, and
push the commit so the ruff format --check job passes.

In `@projects/llm_d/toolbox/prepare_model_cache/main.py`:
- Line 1: Run the code formatter ruff on the module to fix style and unblock CI:
execute `ruff format` for this module (preserving the existing shebang line
"#!/usr/bin/env python3"), commit the resulting changes, and push; ensure the
file projects/llm_d/toolbox/prepare_model_cache/main.py is formatted so the
`ruff format --check` job will pass.

In `@projects/llm_d/toolbox/test/main.py`:
- Around line 79-103: The code currently deletes the LLMinferenceService
resource and only waits for pods to disappear (_old_pods_gone) before
reapplying, which can race if the CR is still terminating; change the flow to
also wait until the LLMinferenceService CR no longer exists by adding a
predicate that uses llmd_runtime.oc_get_json("llminferenceservice", name=name,
namespace=namespace, ignore_not_found=True) and returns True when that call
yields no resource, then call llmd_runtime.wait_until (similar to the existing
wait) to await the CR deletion before calling
llmd_runtime.render_inference_service and llmd_runtime.apply_manifest; reference
the existing delete call (llmd_runtime.oc(... "delete", "llminferenceservice",
name, ...)), the pod-check function _old_pods_gone, and the later calls
llmd_runtime.render_inference_service and llmd_runtime.apply_manifest when
making this change.

---

Duplicate comments:
In `@projects/llm_d/orchestration/manifests/llminferenceservice.yaml`:
- Around line 78-96: The manifest currently buries long tolerances in
livenessProbe and readinessProbe (failureThreshold: 1000 / 10000,
initialDelaySeconds: 900) which masks outages; move the long model-load
tolerance into a startupProbe and give livenessProbe/readinessProbe conservative
values so failures are detected quickly: create a startupProbe with the long
initialDelaySeconds/periodSeconds/failureThreshold to cover cold starts (use the
existing path/port/scheme), then reduce readinessProbe and livenessProbe
failureThreshold to small values (e.g., 3), set smaller
initialDelaySeconds/periodSeconds/timeoutSeconds for responsiveness, and keep
the same httpGet path/port/scheme fields (livenessProbe, readinessProbe,
startupProbe, failureThreshold, initialDelaySeconds, periodSeconds,
timeoutSeconds).

---

Nitpick comments:
In `@projects/llm_d/orchestration/cli.py`:
- Around line 47-52: post_cleanup currently duplicates pre_cleanup by calling
prepare_llmd.cleanup() and sharing the same docstring; either make their intent
explicit or implement the missing divergence. Either refactor both commands to
call a shared helper (e.g., _run_cleanup()) and give pre_cleanup and
post_cleanup distinct docstrings, or modify post_cleanup to perform the
additional post-phase actions (for example calling the appropriate
teardown/namespace/operator cleanup method on prepare_llmd instead of cleanup())
and update its docstring and/or add a comment explaining why behaviors differ;
ensure you update or add tests/comments to reflect the chosen approach.

In `@projects/llm_d/toolbox/prepare_model_cache/main.py`:
- Around line 42-45: The code currently calls capture_model_cache_state twice on
the success path: once after llmd_runtime.annotate_model_cache_pvc() in
run_prepare_model_cache and again in the finally block that already handles both
success and failure; remove the redundant capture_model_cache_state(config,
cache_spec) call that immediately follows
llmd_runtime.annotate_model_cache_pvc(cache_spec) in run_prepare_model_cache so
the single capture in the finally block is the only writer (alternatively, if
the post-annotate state must differ, make capture_model_cache_state idempotent
or add a flag to write only delta updates).
- Around line 139-151: The PVC diagnostic capture in capture_model_cache_state
currently calls capture_resource_yaml("persistentvolumeclaim",
cache_spec.pvc_name, cache_spec.namespace, artifact_dir / "pvc.yaml") with the
default check=True, which can raise and hide the original error; change this
call to pass check=False (matching the Job capture) so
capture_resource_yaml(..., artifact_dir / "pvc.yaml", check=False) to safely
ignore missing PVCs (refer to capture_resource_yaml, capture_model_cache_state,
cache_spec.pvc_name, cache_spec.namespace, and ensure_model_cache_pvc).
- Around line 100-107: The wait for job deletion uses hardcoded
timeout_seconds=120 and interval_seconds=5 which contradicts the configured
download timeouts; update the llmd_runtime.wait_until call (the one referencing
cache_spec.download_job_name and cache_spec.namespace) to use the configured
values from config.model_cache["download"] (use its "wait_timeout_seconds" for
timeout_seconds and the poll/poll_interval_seconds key used elsewhere—e.g.
"poll_interval_seconds"—for interval_seconds), and ensure sensible fallbacks if
those config keys are missing to preserve prior behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 132ea775-98fd-41db-84ba-93cf181751a7

📥 Commits

Reviewing files that changed from the base of the PR and between 7d85dc5 and 3721e71.

📒 Files selected for processing (30)
  • projects/core/dsl/log.py
  • projects/core/dsl/runtime.py
  • projects/core/library/config.py
  • projects/llm_d/README.md
  • projects/llm_d/orchestration/ci.py
  • projects/llm_d/orchestration/cli.py
  • projects/llm_d/orchestration/config.d/model_cache.yaml
  • projects/llm_d/orchestration/config.d/models.yaml
  • projects/llm_d/orchestration/config.d/platform.yaml
  • projects/llm_d/orchestration/config.d/runtime.yaml
  • projects/llm_d/orchestration/config.d/scheduler_profiles.yaml
  • projects/llm_d/orchestration/config.d/workloads.yaml
  • projects/llm_d/orchestration/config.yaml
  • projects/llm_d/orchestration/llmd_runtime.py
  • projects/llm_d/orchestration/manifests/datasciencecluster.yaml
  • projects/llm_d/orchestration/manifests/gateway.yaml
  • projects/llm_d/orchestration/manifests/gpu-clusterpolicy.yaml
  • projects/llm_d/orchestration/manifests/llminferenceservice.yaml
  • projects/llm_d/orchestration/manifests/nfd-nodefeaturediscovery.yaml
  • projects/llm_d/orchestration/prepare_llmd.py
  • projects/llm_d/orchestration/presets.d/cks.yaml
  • projects/llm_d/orchestration/presets.d/presets.yaml
  • projects/llm_d/orchestration/scheduler_profiles/approximate-prefix-cache.yaml
  • projects/llm_d/orchestration/test_llmd.py
  • projects/llm_d/toolbox/capture_llmisvc_state/main.py
  • projects/llm_d/toolbox/cleanup/main.py
  • projects/llm_d/toolbox/prepare/main.py
  • projects/llm_d/toolbox/prepare_model_cache/main.py
  • projects/llm_d/toolbox/test/main.py
  • tests/llm_d/test_runtime.py
💤 Files with no reviewable changes (1)
  • projects/llm_d/orchestration/presets.d/cks.yaml
✅ Files skipped from review due to trivial changes (13)
  • projects/core/dsl/runtime.py
  • projects/llm_d/orchestration/config.d/scheduler_profiles.yaml
  • projects/core/dsl/log.py
  • projects/llm_d/orchestration/manifests/gateway.yaml
  • projects/llm_d/orchestration/manifests/nfd-nodefeaturediscovery.yaml
  • projects/llm_d/orchestration/config.d/workloads.yaml
  • projects/llm_d/orchestration/config.d/runtime.yaml
  • projects/llm_d/toolbox/capture_llmisvc_state/main.py
  • projects/llm_d/orchestration/manifests/gpu-clusterpolicy.yaml
  • projects/llm_d/orchestration/config.d/model_cache.yaml
  • projects/llm_d/orchestration/scheduler_profiles/approximate-prefix-cache.yaml
  • projects/llm_d/orchestration/manifests/datasciencecluster.yaml
  • projects/llm_d/orchestration/config.d/models.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • projects/llm_d/orchestration/ci.py
  • tests/llm_d/test_runtime.py
  • projects/llm_d/orchestration/llmd_runtime.py

Comment on lines +377 to +387
for chunk_file in sorted(config_dir_src.glob("*.yaml")):
with open(chunk_file) as chunk_f:
chunk_value = yaml.safe_load(chunk_f)

key = chunk_file.stem
if key in config:
config[key] = _merge_config_values(config[key], chunk_value)
else:
config[key] = chunk_value

return config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty or non-dict chunk files may silently overwrite existing config.

If a config.d/*.yaml chunk is empty or contains a non-dict top-level document, yaml.safe_load returns None (or a scalar/list), and line 383 will call _merge_config_values(existing_dict, None), which returns copy.deepcopy(None) — effectively replacing any pre-existing config[key] dict with None. Similarly, for a missing key, you'd set config[key] = None, silently clobbering structure expected downstream. Consider skipping empty chunks and validating that each chunk root is a dict.

🛡️ Proposed guard
     for chunk_file in sorted(config_dir_src.glob("*.yaml")):
         with open(chunk_file) as chunk_f:
             chunk_value = yaml.safe_load(chunk_f)

+        if chunk_value is None:
+            continue
+        if not isinstance(chunk_value, dict):
+            raise ValueError(
+                f"Config chunk {chunk_file} must be a mapping, got {type(chunk_value).__name__}"
+            )
+
         key = chunk_file.stem
         if key in config:
             config[key] = _merge_config_values(config[key], chunk_value)
         else:
             config[key] = chunk_value

Note that this also enforces the convention that each chunk file's top-level content becomes nested under a key named after the file stem — worth documenting since it is a surprising default for users familiar with typical *.d/ merge semantics (root-level merge).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for chunk_file in sorted(config_dir_src.glob("*.yaml")):
with open(chunk_file) as chunk_f:
chunk_value = yaml.safe_load(chunk_f)
key = chunk_file.stem
if key in config:
config[key] = _merge_config_values(config[key], chunk_value)
else:
config[key] = chunk_value
return config
for chunk_file in sorted(config_dir_src.glob("*.yaml")):
with open(chunk_file) as chunk_f:
chunk_value = yaml.safe_load(chunk_f)
if chunk_value is None:
continue
if not isinstance(chunk_value, dict):
raise ValueError(
f"Config chunk {chunk_file} must be a mapping, got {type(chunk_value).__name__}"
)
key = chunk_file.stem
if key in config:
config[key] = _merge_config_values(config[key], chunk_value)
else:
config[key] = chunk_value
return config
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/core/library/config.py` around lines 377 - 387, When loading chunks
in the loop (the yaml.safe_load result assigned to chunk_value for each
chunk_file from config_dir_src.glob("*.yaml")), validate that chunk_value is a
dict before merging or assigning: if chunk_value is None or not
isinstance(chunk_value, dict) then skip the file (optionally log or warn) so you
don't overwrite existing config entries with None or scalars; otherwise proceed
to merge via _merge_config_values(config[key], chunk_value) or set config[key] =
chunk_value when missing. Ensure you reference chunk_file.stem when deciding the
key and use yaml.safe_load result type-checking to guard merges.

Comment on lines +17 to +19
cks:
extends:
- smoke
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

cks preset leaves runtime.selected_preset as smoke.

Because cks only extends smoke and doesn't override runtime.selected_preset, after preset application runtime.selected_preset will be "smoke" rather than "cks". If this key is used anywhere for identification, tagging artifacts, or conditional branching, the cks run will be indistinguishable from a smoke run. Consider adding an explicit override for consistency with the other presets:

✏️ Proposed addition
 cks:
   extends:
     - smoke
+  runtime.selected_preset: cks
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/presets.d/presets.yaml` around lines 17 - 19,
The "cks" preset currently only extends "smoke" so runtime.selected_preset
remains "smoke"; update the "cks" preset (the preset named "cks") to explicitly
set runtime.selected_preset to "cks" (i.e., add a runtime.selected_preset: "cks"
override in the cks preset) so that after preset application the selected_preset
correctly reflects "cks" rather than inheriting "smoke".

@@ -0,0 +1,123 @@
#!/usr/bin/env python3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Run ruff format to unblock CI.

The ruff format --check job is failing on this file. Please run ruff format projects/llm_d/toolbox/cleanup/main.py locally and commit the result.

🧰 Tools
🪛 GitHub Actions: Ruff

[error] 1-1: ruff format --check reported formatting issues; file would be reformatted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/cleanup/main.py` at line 1, Run the ruff formatter on
the script containing the shebang line "#!/usr/bin/env python3" to satisfy CI:
execute ruff format on that file locally, review the formatting changes, stage
and commit the updated file, and push the commit so the ruff format --check job
passes.

Comment on lines +71 to +106
llmd_runtime.oc(
"delete",
"pod",
"-n",
namespace,
"-l",
"forge.openshift.io/project=llm_d",
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"pvc",
"-n",
namespace,
"-l",
"forge.openshift.io/project=llm_d,forge.openshift.io/preserve!=true",
"--ignore-not-found=true",
check=False,
)

llmd_runtime.wait_until(
f"llminferenceservice/{inference_service_name} deletion in {namespace}",
timeout_seconds=cleanup_timeout_seconds,
interval_seconds=10,
predicate=lambda: not llmd_runtime.resource_exists(
"llminferenceservice", inference_service_name, namespace=namespace
),
)

llmd_runtime.wait_until(
f"llm-d workload pods deletion in {namespace}",
timeout_seconds=cleanup_timeout_seconds,
interval_seconds=10,
predicate=lambda: _llm_d_pods_gone(namespace, inference_service_name),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Selector mismatch between bulk pod delete and pod-gone wait.

At lines 71-80 you delete pods with forge.openshift.io/project=llm_d, but _llm_d_pods_gone (lines 109-116, called from the wait at 101-106) checks pods with app.kubernetes.io/name={inference_service_name}. These two label sets are not guaranteed to overlap:

  • Pods that were deleted (project=llm_d) may not carry app.kubernetes.io/name=<isvc-name>, so the wait passes immediately even while pods are still terminating.
  • Conversely, InferenceService-owned pods may not carry the forge.openshift.io/project label, so they won't be affected by the bulk delete yet the wait would block on them.

Either issue the bulk delete with the same selector used for the wait, or (preferred) wait using the same forge.openshift.io/project=llm_d selector that was actually used to delete.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/cleanup/main.py` around lines 71 - 106, The pod
deletion uses the selector "forge.openshift.io/project=llm_d" but the wait
predicate calls _llm_d_pods_gone which checks pods by
"app.kubernetes.io/name={inference_service_name}", causing a selector mismatch;
update the wait to use the same selector as the delete (or change
_llm_d_pods_gone to accept/use a selector parameter) so the
llmd_runtime.wait_until predicate checks for pods with
"forge.openshift.io/project=llm_d" (and respect the preserve label if needed)
rather than "app.kubernetes.io/name={inference_service_name}", referencing
llmd_runtime.oc, llmd_runtime.wait_until, _llm_d_pods_gone,
inference_service_name and namespace to locate the change.

Comment thread projects/llm_d/toolbox/prepare_model_cache/main.py
Comment thread projects/llm_d/toolbox/test/main.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/llm_d/test_runtime.py (2)

398-405: Minor: replace the (_ for _ in ()).throw(...) trick with a plain helper.

Lines 404 and 472 use a generator-expression .throw() to make a lambda raise. It works, but it's easy to misread and doesn't carry forward through refactors (e.g., adding a second assertion before the raise). A def helper or pytest.MonkeyPatch with a named function is clearer:

♻️ Alternative
def _raise_terminal(*_args, **_kwargs):
    raise RuntimeError("terminal failure")

llmd_runtime.wait_until(
    "test condition",
    timeout_seconds=1,
    interval_seconds=0,
    predicate=_raise_terminal,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm_d/test_runtime.py` around lines 398 - 405, Replace the
generator-expression trick used to make the predicate raise in
test_wait_until_reraises_runtime_error with a named helper function: create a
small function (e.g., _raise_terminal) that raises RuntimeError("terminal
failure") and pass that function as the predicate to llmd_runtime.wait_until;
update the similar occurrence mentioned (another test around line 472) to use
the same helper so the tests read clearly and survive refactors.

25-101: Extract the repeated load_run_configuration setup into a fixture.

The three-line incantation (setenv FORGE_CONFIG_OVERRIDES, create artifact_dir, call load_run_configuration(cwd=tmp_path, artifact_dir=artifact_dir)) is duplicated in ~14 tests. A small fixture would remove the boilerplate and make the intent of each test clearer, and let individual tests override just the pieces they care about (e.g., fournos_config.yaml contents or FORGE_CONFIG_OVERRIDES).

♻️ Sketch
`@pytest.fixture`
def loaded_config(tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
    def _load(overrides: str = "{}", fournos_config: str | None = None):
        monkeypatch.setenv("FORGE_CONFIG_OVERRIDES", overrides)
        artifact_dir = tmp_path / "artifacts"
        artifact_dir.mkdir(exist_ok=True)
        if fournos_config is not None:
            (tmp_path / "fournos_config.yaml").write_text(fournos_config, encoding="utf-8")
        return llmd_runtime.load_run_configuration(cwd=tmp_path, artifact_dir=artifact_dir), artifact_dir
    return _load
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm_d/test_runtime.py` around lines 25 - 101, Tests repeatedly set
FORGE_CONFIG_OVERRIDES, create artifact_dir, and call
llmd_runtime.load_run_configuration; extract that boilerplate into a pytest
fixture (e.g., loaded_config) that accepts optional overrides string and
fournos_config contents, uses tmp_path and monkeypatch to set the env var,
ensures artifact_dir exists, optionally writes fournos_config.yaml, and returns
(config, artifact_dir); update tests to call the fixture instead of duplicating
the three-line setup so tests can override only what they need while still
calling llmd_runtime.load_run_configuration from the fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llm_d/test_runtime.py`:
- Around line 246-343: The mocks for llmd_runtime.resource_exists use the wrong
signature; update the test stubs (fake_resource_exists in the cleanup test and
the two 2-arg lambdas in the GPU operator tests) to match the real signature def
resource_exists(kind: str, name: str, *, namespace: str | None = None) by adding
the keyword-only marker and accepting namespace as a kw-only param (e.g., change
signatures to include "*, namespace: str | None = None" or use a lambda like
"lambda kind, name, *, namespace=None: ...") so calls with namespace=... won't
raise TypeError and the mock matches the real function shape.

---

Nitpick comments:
In `@tests/llm_d/test_runtime.py`:
- Around line 398-405: Replace the generator-expression trick used to make the
predicate raise in test_wait_until_reraises_runtime_error with a named helper
function: create a small function (e.g., _raise_terminal) that raises
RuntimeError("terminal failure") and pass that function as the predicate to
llmd_runtime.wait_until; update the similar occurrence mentioned (another test
around line 472) to use the same helper so the tests read clearly and survive
refactors.
- Around line 25-101: Tests repeatedly set FORGE_CONFIG_OVERRIDES, create
artifact_dir, and call llmd_runtime.load_run_configuration; extract that
boilerplate into a pytest fixture (e.g., loaded_config) that accepts optional
overrides string and fournos_config contents, uses tmp_path and monkeypatch to
set the env var, ensures artifact_dir exists, optionally writes
fournos_config.yaml, and returns (config, artifact_dir); update tests to call
the fixture instead of duplicating the three-line setup so tests can override
only what they need while still calling llmd_runtime.load_run_configuration from
the fixture.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f6d12fa-b324-4cd0-a64d-f25081c48e34

📥 Commits

Reviewing files that changed from the base of the PR and between 3721e71 and acd62ba.

📒 Files selected for processing (2)
  • projects/llm_d/orchestration/llmd_runtime.py
  • tests/llm_d/test_runtime.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • projects/llm_d/orchestration/llmd_runtime.py

Comment thread tests/llm_d/test_runtime.py
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
projects/llm_d/toolbox/test/main.py (1)

88-112: ⚠️ Potential issue | 🟠 Major

Wait for the old LLMInferenceService CR to finish deleting before reapplying.

This waits for pods to disappear, but the CR itself can still be terminating. Reapplying the same name can race finalizers/controllers.

Proposed fix
     llmd_runtime.wait_until(
         f"old llm-d pods to disappear in {namespace}",
         timeout_seconds=config.platform["inference_service"]["delete_timeout_seconds"],
         interval_seconds=10,
         predicate=_old_pods_gone,
     )
+
+    llmd_runtime.wait_until(
+        f"old llminferenceservice/{name} deletion in {namespace}",
+        timeout_seconds=config.platform["inference_service"]["delete_timeout_seconds"],
+        interval_seconds=10,
+        predicate=lambda: not llmd_runtime.resource_exists(
+            "llminferenceservice",
+            name,
+            namespace=namespace,
+        ),
+    )
 
     manifest = llmd_runtime.render_inference_service(config)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/test/main.py` around lines 88 - 112, The code waits
for old pods to disappear but not for the LLMInferenceService CR itself to be
deleted; update the deletion sequence so after calling llmd_runtime.oc("delete",
"llminferenceservice", name, ...) you also poll for the CR to be gone (e.g. add
a predicate function that calls llmd_runtime.oc_get_json("llminferenceservice",
name=name, namespace=namespace, ignore_not_found=True) and returns True when the
result is missing) and use llmd_runtime.wait_until with the same timeout
(config.platform["inference_service"]["delete_timeout_seconds"]) to wait for the
CR deletion before calling llmd_runtime.render_inference_service and
llmd_runtime.apply_manifest; reference the existing _old_pods_gone, wait_until,
render_inference_service, and apply_manifest symbols to locate where to insert
this additional wait.
projects/llm_d/orchestration/llmd_runtime.py (1)

1194-1208: ⚠️ Potential issue | 🟠 Major

Avoid root/privileged init containers for the GuideLLM copy pod.

Line 1204 still requests UID 0 and Line 1205 enables privilege escalation. On OpenShift clusters using restricted SCCs, this pod can be rejected before results are copied. Prefer non-root-compatible PVC permissions, pod-level fsGroup, or an explicit documented service account/SCC binding if root is truly required.

Read-only verification
#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path

path = Path("projects/llm_d/orchestration/llmd_runtime.py")
text = path.read_text(encoding="utf-8")

for needle in ('"runAsUser": 0', '"allowPrivilegeEscalation": True'):
    print(f"{needle}: {needle in text}")
PY

Expected: both values should be absent after the fix, unless this pod is intentionally tied to a privileged SCC.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/orchestration/llmd_runtime.py` around lines 1194 - 1208, The
initContainer "permission-fixer" currently requests UID 0 and enables privilege
escalation via the securityContext fields ("runAsUser": 0 and
"allowPrivilegeEscalation": True); remove those root/privileged settings from
the initContainers securityContext, instead run the container as a non-root UID
(e.g., set runAsUser to 1001 or remove per-container runAsUser) and remove
allowPrivilegeEscalation, and ensure the pod uses a podSecurityContext/fsGroup
(e.g., set podSecurityContext.fsGroup to 1001) or use non-root-compatible PVC
permissions or a documented service account/SCC binding if root is truly
required; update the "permission-fixer" and surrounding pod spec to reflect
these changes so the pod is accepted by restricted SCCs.
tests/llm_d/test_runtime.py (1)

268-268: ⚠️ Potential issue | 🟡 Minor

Match resource_exists test doubles to the real keyword-only signature.

The real helper accepts namespace as keyword-only. These stubs either allow it positionally or omit it, making tests less representative and brittle when call sites pass namespace=.

Proposed fix
-    def fake_resource_exists(kind: str, name: str, namespace: str | None = None) -> bool:
+    def fake_resource_exists(kind: str, name: str, *, namespace: str | None = None) -> bool:
         if kind == "namespace":
             return True
         return False
-    monkeypatch.setattr(llmd_runtime, "resource_exists", lambda kind, name: True)
+    monkeypatch.setattr(
+        llmd_runtime,
+        "resource_exists",
+        lambda kind, name, *, namespace=None: True,
+    )
-    monkeypatch.setattr(llmd_runtime, "resource_exists", lambda kind, name: False)
+    monkeypatch.setattr(
+        llmd_runtime,
+        "resource_exists",
+        lambda kind, name, *, namespace=None: False,
+    )

Also applies to: 326-326, 365-365

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm_d/test_runtime.py` at line 268, Test doubles for resource_exists
(e.g., fake_resource_exists) do not match the real function's keyword-only
`namespace` parameter; update their signatures to make `namespace` keyword-only
by inserting a bare * before it (for example: def fake_resource_exists(kind:
str, name: str, *, namespace: str | None = None) -> bool:) and apply the same
change to the other duplicates so call sites using `namespace=` behave the same
as the real helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/llm_d/toolbox/test/main.py`:
- Around line 41-69: The code always computes benchmark_name and deletes
benchmark resources even when config.benchmark is false; wrap the
llmd_runtime.oc calls that delete the benchmark job/pvc and the pod (the calls
that use benchmark_name and f"{benchmark_name}-copy") in a conditional that
checks config.benchmark is truthy before executing them (leave the
smoke_job_name deletion as-is). This ensures benchmark-related cleanup
(references: benchmark_name variable and the llmd_runtime.oc calls deleting
"job,pvc" and "pod" with f"{benchmark_name}-copy") only runs for non-smoke
benchmark runs.
- Around line 269-294: The delete calls for Job/PVC/ copy Pod
(llmd_runtime.oc(... "delete", "job,pvc", benchmark_name, ... ) and
llmd_runtime.oc(... "delete", "pod", f"{benchmark_name}-copy", ... )) race with
immediate applies (llmd_runtime.apply_manifest using render_guidellm_pvc and
render_guidellm_job), so update the flow to wait until the deleted resources are
fully removed/terminated before calling apply; implement a short polling loop
(or use a blocking delete equivalent) that checks resource existence/status for
benchmark_name in namespace and only proceeds to call
llmd_runtime.apply_manifest with render_guidellm_pvc and render_guidellm_job
(passing endpoint_url) once the Job, PVC and copy Pod are absent or no longer in
Terminating state.

---

Duplicate comments:
In `@projects/llm_d/orchestration/llmd_runtime.py`:
- Around line 1194-1208: The initContainer "permission-fixer" currently requests
UID 0 and enables privilege escalation via the securityContext fields
("runAsUser": 0 and "allowPrivilegeEscalation": True); remove those
root/privileged settings from the initContainers securityContext, instead run
the container as a non-root UID (e.g., set runAsUser to 1001 or remove
per-container runAsUser) and remove allowPrivilegeEscalation, and ensure the pod
uses a podSecurityContext/fsGroup (e.g., set podSecurityContext.fsGroup to 1001)
or use non-root-compatible PVC permissions or a documented service account/SCC
binding if root is truly required; update the "permission-fixer" and surrounding
pod spec to reflect these changes so the pod is accepted by restricted SCCs.

In `@projects/llm_d/toolbox/test/main.py`:
- Around line 88-112: The code waits for old pods to disappear but not for the
LLMInferenceService CR itself to be deleted; update the deletion sequence so
after calling llmd_runtime.oc("delete", "llminferenceservice", name, ...) you
also poll for the CR to be gone (e.g. add a predicate function that calls
llmd_runtime.oc_get_json("llminferenceservice", name=name, namespace=namespace,
ignore_not_found=True) and returns True when the result is missing) and use
llmd_runtime.wait_until with the same timeout
(config.platform["inference_service"]["delete_timeout_seconds"]) to wait for the
CR deletion before calling llmd_runtime.render_inference_service and
llmd_runtime.apply_manifest; reference the existing _old_pods_gone, wait_until,
render_inference_service, and apply_manifest symbols to locate where to insert
this additional wait.

In `@tests/llm_d/test_runtime.py`:
- Line 268: Test doubles for resource_exists (e.g., fake_resource_exists) do not
match the real function's keyword-only `namespace` parameter; update their
signatures to make `namespace` keyword-only by inserting a bare * before it (for
example: def fake_resource_exists(kind: str, name: str, *, namespace: str | None
= None) -> bool:) and apply the same change to the other duplicates so call
sites using `namespace=` behave the same as the real helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bfa87b46-6cb6-49be-88ca-025491cae9ef

📥 Commits

Reviewing files that changed from the base of the PR and between acd62ba and d155341.

📒 Files selected for processing (5)
  • projects/llm_d/orchestration/config.d/platform.yaml
  • projects/llm_d/orchestration/llmd_runtime.py
  • projects/llm_d/toolbox/prepare/main.py
  • projects/llm_d/toolbox/test/main.py
  • tests/llm_d/test_runtime.py
✅ Files skipped from review due to trivial changes (1)
  • projects/llm_d/orchestration/config.d/platform.yaml

Comment on lines +41 to +69
benchmark_name = config.benchmark["job_name"] if config.benchmark else "guidellm-benchmark"
smoke_job_name = config.platform["smoke"]["job_name"]
llmd_runtime.oc(
"delete",
"job",
smoke_job_name,
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"job,pvc",
benchmark_name,
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"pod",
f"{benchmark_name}-copy",
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t delete benchmark resources during smoke-only runs.

When config.benchmark is false, this still deletes job,pvc guidellm-benchmark and pod guidellm-benchmark-copy in the shared namespace. That can remove benchmark results/resources from a different run.

Proposed fix
-        benchmark_name = config.benchmark["job_name"] if config.benchmark else "guidellm-benchmark"
+        benchmark_name = config.benchmark["job_name"] if config.benchmark else None
         smoke_job_name = config.platform["smoke"]["job_name"]
         llmd_runtime.oc(
             "delete",
             "job",
             smoke_job_name,
@@
             "--ignore-not-found=true",
             check=False,
         )
-        llmd_runtime.oc(
-            "delete",
-            "job,pvc",
-            benchmark_name,
-            "-n",
-            namespace,
-            "--ignore-not-found=true",
-            check=False,
-        )
-        llmd_runtime.oc(
-            "delete",
-            "pod",
-            f"{benchmark_name}-copy",
-            "-n",
-            namespace,
-            "--ignore-not-found=true",
-            check=False,
-        )
+        if benchmark_name:
+            llmd_runtime.oc(
+                "delete",
+                "job,pvc",
+                benchmark_name,
+                "-n",
+                namespace,
+                "--ignore-not-found=true",
+                check=False,
+            )
+            llmd_runtime.oc(
+                "delete",
+                "pod",
+                f"{benchmark_name}-copy",
+                "-n",
+                namespace,
+                "--ignore-not-found=true",
+                check=False,
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/test/main.py` around lines 41 - 69, The code always
computes benchmark_name and deletes benchmark resources even when
config.benchmark is false; wrap the llmd_runtime.oc calls that delete the
benchmark job/pvc and the pod (the calls that use benchmark_name and
f"{benchmark_name}-copy") in a conditional that checks config.benchmark is
truthy before executing them (leave the smoke_job_name deletion as-is). This
ensures benchmark-related cleanup (references: benchmark_name variable and the
llmd_runtime.oc calls deleting "job,pvc" and "pod" with
f"{benchmark_name}-copy") only runs for non-smoke benchmark runs.

Comment on lines +269 to +294
llmd_runtime.oc(
"delete",
"job,pvc",
benchmark_name,
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)
llmd_runtime.oc(
"delete",
"pod",
f"{benchmark_name}-copy",
"-n",
namespace,
"--ignore-not-found=true",
check=False,
)

llmd_runtime.apply_manifest(
config.artifact_dir / "src" / "guidellm-pvc.yaml",
llmd_runtime.render_guidellm_pvc(config),
)
llmd_runtime.apply_manifest(
config.artifact_dir / "src" / "guidellm-job.yaml",
llmd_runtime.render_guidellm_job(config, endpoint_url),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait for benchmark resources to terminate before recreating them.

The benchmark flow deletes the old Job/PVC/copy Pod and immediately reapplies the PVC/Job. If any resource is still terminating, the apply can fail or bind to stale state.

Proposed fix
     llmd_runtime.oc(
         "delete",
         "pod",
         f"{benchmark_name}-copy",
@@
         "--ignore-not-found=true",
         check=False,
     )
+
+    for kind, resource_name in (
+        ("job", benchmark_name),
+        ("persistentvolumeclaim", benchmark_name),
+        ("pod", f"{benchmark_name}-copy"),
+    ):
+        llmd_runtime.wait_until(
+            f"{kind}/{resource_name} deletion in {namespace}",
+            timeout_seconds=config.benchmark["timeout_seconds"],
+            interval_seconds=5,
+            predicate=lambda kind=kind, resource_name=resource_name: not llmd_runtime.resource_exists(
+                kind,
+                resource_name,
+                namespace=namespace,
+            ),
+        )
 
     llmd_runtime.apply_manifest(
         config.artifact_dir / "src" / "guidellm-pvc.yaml",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm_d/toolbox/test/main.py` around lines 269 - 294, The delete calls
for Job/PVC/ copy Pod (llmd_runtime.oc(... "delete", "job,pvc", benchmark_name,
... ) and llmd_runtime.oc(... "delete", "pod", f"{benchmark_name}-copy", ... ))
race with immediate applies (llmd_runtime.apply_manifest using
render_guidellm_pvc and render_guidellm_job), so update the flow to wait until
the deleted resources are fully removed/terminated before calling apply;
implement a short polling loop (or use a blocking delete equivalent) that checks
resource existence/status for benchmark_name in namespace and only proceeds to
call llmd_runtime.apply_manifest with render_guidellm_pvc and
render_guidellm_job (passing endpoint_url) once the Job, PVC and copy Pod are
absent or no longer in Terminating state.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/llm_d/test_runtime.py (1)

309-406: ⚠️ Potential issue | 🟡 Minor

Keep resource_exists test doubles aligned with the real signature.

The real helper takes namespace as keyword-only, so these mocks should accept *, namespace=None; the two-arg lambdas will raise if the code starts passing namespace=....

Proposed fix
-    def fake_resource_exists(kind: str, name: str, namespace: str | None = None) -> bool:
+    def fake_resource_exists(
+        kind: str,
+        name: str,
+        *,
+        namespace: str | None = None,
+    ) -> bool:
         if kind == "namespace":
             return True
         return False
-    monkeypatch.setattr(llmd_runtime, "resource_exists", lambda kind, name: True)
+    monkeypatch.setattr(
+        llmd_runtime,
+        "resource_exists",
+        lambda kind, name, *, namespace=None: True,
+    )
-    monkeypatch.setattr(llmd_runtime, "resource_exists", lambda kind, name: False)
+    monkeypatch.setattr(
+        llmd_runtime,
+        "resource_exists",
+        lambda kind, name, *, namespace=None: False,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm_d/test_runtime.py` around lines 309 - 406, Update the test doubles
for llmd_runtime.resource_exists to match the real signature by accepting
namespace as a keyword-only parameter; replace any lambda or helper like
fake_resource_exists(kind, name, namespace=None) or lambda kind, name: ... with
signatures that include "*, namespace=None" (e.g. def fake_resource_exists(kind,
name, *, namespace=None): ...), and change the two-arg lambdas in
test_prepare_gpu_operator_skips_existing_clusterpolicy and
test_prepare_gpu_operator_bootstraps_missing_clusterpolicy (and any other
two-arg mocks) to accept the keyword-only namespace parameter so calls like
resource_exists(kind, name, namespace=...) won't raise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llm_d/test_runtime.py`:
- Line 14: Add a pytest fixture named e.g. isolate_forge_env with scope="module"
and autouse=True that uses monkeypatch to delete/unset FORGE_PRESET and
FORGE_JOB_NAME and to set ARTIFACT_DIR to a fresh temp directory (via
tmp_path_factory) before any tests run; this ensures load_run_configuration()
will not pick up ambient FORGE_* vars and that ARTIFACT_DIR mutations are
confined to the test module. Make sure the fixture is placed in
tests/llm_d/test_runtime.py (or conftest.py if you prefer) and references
load_run_configuration and ARTIFACT_DIR behavior indirectly by preparing the
environment before test import/execution.

---

Duplicate comments:
In `@tests/llm_d/test_runtime.py`:
- Around line 309-406: Update the test doubles for llmd_runtime.resource_exists
to match the real signature by accepting namespace as a keyword-only parameter;
replace any lambda or helper like fake_resource_exists(kind, name,
namespace=None) or lambda kind, name: ... with signatures that include "*,
namespace=None" (e.g. def fake_resource_exists(kind, name, *, namespace=None):
...), and change the two-arg lambdas in
test_prepare_gpu_operator_skips_existing_clusterpolicy and
test_prepare_gpu_operator_bootstraps_missing_clusterpolicy (and any other
two-arg mocks) to accept the keyword-only namespace parameter so calls like
resource_exists(kind, name, namespace=...) won't raise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63c80471-de67-42d7-9b97-04b83e8d8b59

📥 Commits

Reviewing files that changed from the base of the PR and between d155341 and 72352f4.

📒 Files selected for processing (7)
  • projects/llm_d/orchestration/config.d/runtime.yaml
  • projects/llm_d/orchestration/config.d/scheduler_profiles.yaml
  • projects/llm_d/orchestration/llmd_runtime.py
  • projects/llm_d/orchestration/presets.d/presets.yaml
  • projects/llm_d/orchestration/scheduler_profiles/approximate.yaml
  • projects/llm_d/orchestration/scheduler_profiles/precise.yaml
  • tests/llm_d/test_runtime.py
✅ Files skipped from review due to trivial changes (5)
  • projects/llm_d/orchestration/scheduler_profiles/approximate.yaml
  • projects/llm_d/orchestration/scheduler_profiles/precise.yaml
  • projects/llm_d/orchestration/config.d/scheduler_profiles.yaml
  • projects/llm_d/orchestration/presets.d/presets.yaml
  • projects/llm_d/orchestration/llmd_runtime.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • projects/llm_d/orchestration/config.d/runtime.yaml

from projects.llm_d.toolbox.prepare_model_cache import main as prepare_model_cache_toolbox
from projects.llm_d.toolbox.test import main as test_toolbox


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Isolate Forge environment variables for the whole test module.

Most tests assume the default preset/config, but load_run_configuration() reads ambient FORGE_PRESET/FORGE_JOB_NAME and mutates ARTIFACT_DIR. A developer or CI environment with these variables set can make these tests unexpectedly select a different preset.

Proposed fixture
 from projects.llm_d.toolbox.test import main as test_toolbox
 
 
+@pytest.fixture(autouse=True)
+def _isolate_forge_env(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.delenv("ARTIFACT_DIR", raising=False)
+    monkeypatch.delenv("FORGE_JOB_NAME", raising=False)
+    monkeypatch.delenv("FORGE_PRESET", raising=False)
+    monkeypatch.setenv("FORGE_CONFIG_OVERRIDES", "{}")
+
+
 def test_derive_namespace_uses_prefix_once() -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm_d/test_runtime.py` at line 14, Add a pytest fixture named e.g.
isolate_forge_env with scope="module" and autouse=True that uses monkeypatch to
delete/unset FORGE_PRESET and FORGE_JOB_NAME and to set ARTIFACT_DIR to a fresh
temp directory (via tmp_path_factory) before any tests run; this ensures
load_run_configuration() will not pick up ambient FORGE_* vars and that
ARTIFACT_DIR mutations are confined to the test module. Make sure the fixture is
placed in tests/llm_d/test_runtime.py (or conftest.py if you prefer) and
references load_run_configuration and ARTIFACT_DIR behavior indirectly by
preparing the environment before test import/execution.

@albertoperdomo2 albertoperdomo2 changed the title WIP - feat: Downstream llm-d feat: Downstream llm-d Apr 21, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants