Skip to content

feat: expose CodeInterpreter warm pool health#385

Open
ranxi2001 wants to merge 3 commits into
volcano-sh:mainfrom
ranxi2001:feat/warmpool-available-condition
Open

feat: expose CodeInterpreter warm pool health#385
ranxi2001 wants to merge 3 commits into
volcano-sh:mainfrom
ranxi2001:feat/warmpool-available-condition

Conversation

@ranxi2001

Copy link
Copy Markdown

What type of PR is this?

/kind enhancement

What this PR does / why we need it:

This PR surfaces SandboxWarmPool health on CodeInterpreter status.

Before this change, CodeInterpreter could report Ready=true after reconciliation even when the underlying SandboxWarmPool had zero or too few ready replicas. In that state, session creation may fall back to cold start or fail, but users cannot diagnose the warm pool health from the CodeInterpreter status.

This change adds:

  • a WarmPoolAvailable condition on CodeInterpreter status
  • ReadyReplicas-based health evaluation for the owned SandboxWarmPool
  • Warning Events when the warm pool is missing, empty, or below the low watermark
  • a watch on owned SandboxWarmPool resources so warm pool status changes requeue CodeInterpreter reconciliation

Which issue(s) this PR fixes:

Fixes #265

Special notes for your reviewer:

  • No new CRD fields are added; this reuses CodeInterpreterStatus.Conditions.
  • The low watermark is currently ceil(warmPoolSize / 2), matching the 50% desired threshold proposed in the issue.
  • A newly created warm pool with ReadyReplicas=0 emits a WarmPoolEmpty warning once. If the preferred behavior is to warn only after a previously ready pool degrades, I can adjust the event policy.
  • No code generation is required.

Tests:

  • /tmp/go-toolchain/go/bin/go test ./cmd/workload-manager ./pkg/workloadmanager
  • /tmp/go-toolchain/go/bin/go test -v ./pkg/workloadmanager
  • /tmp/go-toolchain/go/bin/go test -v ./pkg/workloadmanager -run 'TestReconcileReportsWarmPoolEmptyInsteadOfOnlyReady|TestReconcileReportsWarmPoolBelowWatermark|TestReconcileUpdatesWarmPoolAvailableWhenPoolRecovers'
  • git diff --check

Also attempted:

  • PATH=/tmp/go-toolchain/go/bin:$PATH make test

make test did not complete locally because the e2e tests require a running router/workload-manager and kubeconfig. The failure was in test/e2e with connect: connection refused and missing kubeconfig errors.

Does this PR introduce a user-facing change?

CodeInterpreter status now reports SandboxWarmPool health through a WarmPoolAvailable condition and warning Events.

Signed-off-by: ranxi2001 <ranxi169@163.com>
Copilot AI review requested due to automatic review settings June 15, 2026 06:43
@volcano-sh-bot volcano-sh-bot added the kind/enhancement New feature or request label Jun 15, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot

Copy link
Copy Markdown
Contributor

Welcome @ranxi2001! It looks like this is your first PR to volcano-sh/agentcube 🎉

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the CodeInterpreterReconciler to track and report the status of the associated SandboxWarmPool using a new WarmPoolAvailable condition, as well as recording warning events when the pool is empty or below its watermark. Unit tests have been added to verify this behavior. The review feedback suggests ignoring the WarmPoolNotFound reason when recording warning events to prevent transient warning spam caused by asynchronous cache delays during initial resource creation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +216 to +220
if current.Reason != codeInterpreterWarmPoolEmpty &&
current.Reason != codeInterpreterWarmPoolBelowWatermark &&
current.Reason != codeInterpreterWarmPoolNotFound {
return false
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Transient Warning Event on Creation / Cache Delay

During the first reconciliation of a newly created CodeInterpreter, ensureSandboxWarmPool will create the SandboxWarmPool resource. However, because the controller's client uses an asynchronous cache (informer), the subsequent r.Get call in warmPoolAvailableCondition (executed in the same reconciliation loop) will likely return a NotFound error because the cache has not yet synced with the newly created resource.

This causes the controller to transiently set the condition to WarmPoolNotFound and emit a Warning event. Once the cache syncs a moment later, a new reconciliation is triggered, and the status is updated to WarmPoolEmpty or WarmPoolReady.

Since the controller automatically and immediately self-heals by creating/recreating the SandboxWarmPool when it is missing, WarmPoolNotFound is always a transient state. Emitting a Warning event for it is noisy and unnecessary.

We should avoid recording warning events for WarmPoolNotFound to prevent spamming warning events on every initial creation of a CodeInterpreter.

	if current.Reason != codeInterpreterWarmPoolEmpty &&
		current.Reason != codeInterpreterWarmPoolBelowWatermark {
		return false
	}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated in the latest push.

Change:

  • Kept the WarmPoolNotFound condition for status visibility.
  • Stopped recording a Warning Event for WarmPoolNotFound, since it can be a transient cache-delay state right after the controller creates the SandboxWarmPool.
  • Added a unit test to verify that WarmPoolNotFound does not emit a warning event.

Validation:

  • /tmp/go-toolchain/go/bin/go test -v ./pkg/workloadmanager -run 'TestShouldRecordWarmPoolWarningEvent|TestReconcileReportsWarmPoolEmptyInsteadOfOnlyReady|TestReconcileReportsWarmPoolBelowWatermark|TestReconcileUpdatesWarmPoolAvailableWhenPoolRecovers'
  • /tmp/go-toolchain/go/bin/go test ./pkg/workloadmanager
  • git diff --check

@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.10390% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.15%. Comparing base (524e55e) to head (d885b4e).
⚠️ Report is 125 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/codeinterpreter_controller.go 96.10% 2 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #385       +/-   ##
===========================================
+ Coverage   47.57%   61.15%   +13.58%     
===========================================
  Files          30       34        +4     
  Lines        2819     3249      +430     
===========================================
+ Hits         1341     1987      +646     
+ Misses       1338     1051      -287     
- Partials      140      211       +71     
Flag Coverage Δ
unittests 61.15% <96.10%> (+13.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: ranxi2001 <ranxi169@163.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@RainbowMango RainbowMango left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Signed-off-by: ranxi2001 <ranxi169@163.com>
@ranxi2001 ranxi2001 force-pushed the feat/warmpool-available-condition branch from 3b07619 to d885b4e Compare June 16, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or request size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: observe SandboxWarmPool health in CodeInterpreter status (WarmPoolAvailable condition + events)

5 participants