feat: expose CodeInterpreter warm pool health#385
Conversation
Signed-off-by: ranxi2001 <ranxi169@163.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @ranxi2001! It looks like this is your first PR to volcano-sh/agentcube 🎉 |
There was a problem hiding this comment.
Code Review
This pull request enhances the CodeInterpreterReconciler to track and report the status of the associated SandboxWarmPool using a new WarmPoolAvailable condition, as well as recording warning events when the pool is empty or below its watermark. Unit tests have been added to verify this behavior. The review feedback suggests ignoring the WarmPoolNotFound reason when recording warning events to prevent transient warning spam caused by asynchronous cache delays during initial resource creation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if current.Reason != codeInterpreterWarmPoolEmpty && | ||
| current.Reason != codeInterpreterWarmPoolBelowWatermark && | ||
| current.Reason != codeInterpreterWarmPoolNotFound { | ||
| return false | ||
| } |
There was a problem hiding this comment.
Transient Warning Event on Creation / Cache Delay
During the first reconciliation of a newly created CodeInterpreter, ensureSandboxWarmPool will create the SandboxWarmPool resource. However, because the controller's client uses an asynchronous cache (informer), the subsequent r.Get call in warmPoolAvailableCondition (executed in the same reconciliation loop) will likely return a NotFound error because the cache has not yet synced with the newly created resource.
This causes the controller to transiently set the condition to WarmPoolNotFound and emit a Warning event. Once the cache syncs a moment later, a new reconciliation is triggered, and the status is updated to WarmPoolEmpty or WarmPoolReady.
Since the controller automatically and immediately self-heals by creating/recreating the SandboxWarmPool when it is missing, WarmPoolNotFound is always a transient state. Emitting a Warning event for it is noisy and unnecessary.
We should avoid recording warning events for WarmPoolNotFound to prevent spamming warning events on every initial creation of a CodeInterpreter.
if current.Reason != codeInterpreterWarmPoolEmpty &&
current.Reason != codeInterpreterWarmPoolBelowWatermark {
return false
}There was a problem hiding this comment.
Thanks, updated in the latest push.
Change:
- Kept the
WarmPoolNotFoundcondition for status visibility. - Stopped recording a Warning Event for
WarmPoolNotFound, since it can be a transient cache-delay state right after the controller creates theSandboxWarmPool. - Added a unit test to verify that
WarmPoolNotFounddoes not emit a warning event.
Validation:
/tmp/go-toolchain/go/bin/go test -v ./pkg/workloadmanager -run 'TestShouldRecordWarmPoolWarningEvent|TestReconcileReportsWarmPoolEmptyInsteadOfOnlyReady|TestReconcileReportsWarmPoolBelowWatermark|TestReconcileUpdatesWarmPoolAvailableWhenPoolRecovers'/tmp/go-toolchain/go/bin/go test ./pkg/workloadmanagergit diff --check
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #385 +/- ##
===========================================
+ Coverage 47.57% 61.15% +13.58%
===========================================
Files 30 34 +4
Lines 2819 3249 +430
===========================================
+ Hits 1341 1987 +646
+ Misses 1338 1051 -287
- Partials 140 211 +71
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: ranxi2001 <ranxi169@163.com>
Signed-off-by: ranxi2001 <ranxi169@163.com>
3b07619 to
d885b4e
Compare
What type of PR is this?
/kind enhancement
What this PR does / why we need it:
This PR surfaces SandboxWarmPool health on CodeInterpreter status.
Before this change, CodeInterpreter could report Ready=true after reconciliation even when the underlying SandboxWarmPool had zero or too few ready replicas. In that state, session creation may fall back to cold start or fail, but users cannot diagnose the warm pool health from the CodeInterpreter status.
This change adds:
Which issue(s) this PR fixes:
Fixes #265
Special notes for your reviewer:
Tests:
/tmp/go-toolchain/go/bin/go test ./cmd/workload-manager ./pkg/workloadmanager/tmp/go-toolchain/go/bin/go test -v ./pkg/workloadmanager/tmp/go-toolchain/go/bin/go test -v ./pkg/workloadmanager -run 'TestReconcileReportsWarmPoolEmptyInsteadOfOnlyReady|TestReconcileReportsWarmPoolBelowWatermark|TestReconcileUpdatesWarmPoolAvailableWhenPoolRecovers'git diff --checkAlso attempted:
PATH=/tmp/go-toolchain/go/bin:$PATH make testmake testdid not complete locally because the e2e tests require a running router/workload-manager and kubeconfig. The failure was intest/e2ewithconnect: connection refusedand missing kubeconfig errors.Does this PR introduce a user-facing change?