optim(runtime): optimize deletion logic for datasets and runtimes#5790
optim(runtime): optimize deletion logic for datasets and runtimes#5790TrafalgarZZZ wants to merge 7 commits intofluid-cloudnative:masterfrom
Conversation
Signed-off-by: TzZtzt <trafalgarz@outlook.com>
Signed-off-by: TzZtzt <trafalgarz@outlook.com>
Signed-off-by: TzZtzt <trafalgarz@outlook.com>
- Introduce LegacyEnvBlockInUseDatasetDeletion to optionally block dataset deletion if pods use it - Add LegacyEnvForceCleanUpManagedPVC to control forced cleanup of PVC protection finalizers - Wrap legacy dataset deletion check with environment flag to skip it by default - Mark legacy mechanisms for future removal and provide warning comments Signed-off-by: TzZtzt <trafalgarz@outlook.com>
Signed-off-by: TzZtzt <trafalgarz@outlook.com>
Signed-off-by: TzZtzt <trafalgarz@outlook.com>
Signed-off-by: TzZtzt <trafalgarz@outlook.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
There was a problem hiding this comment.
Code Review
This pull request introduces legacy environment variables for backward compatibility in dataset deletion and PVC cleanup, refactors PVC deletion to use exponential backoff, and adds watches for PVCs and PVs to trigger reconciliation. Feedback focuses on increasing the 1-second confirmation timeout, aligning backoff parameters, replacing hardcoded annotation strings with constants, and documenting the risks of using UnsafeDisableDeepCopy in shared utilities.
| stillFound := false | ||
| retries := 10 | ||
| for i := 0; i < retries; i++ { | ||
| ctx, cancelFunc := context.WithTimeout(context.Background(), 1*time.Second) |
There was a problem hiding this comment.
The 1-second timeout for confirming PVC deletion is extremely short. Kubernetes resource deletion often takes several seconds to propagate through controllers and be reflected in the API. A 1-second window will likely cause frequent timeouts and reconciliation requeues under normal cluster load. Consider increasing this to 5 seconds for better robustness.
| ctx, cancelFunc := context.WithTimeout(context.Background(), 1*time.Second) | |
| ctx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Second) |
| backoff := wait.Backoff{Duration: 100 * time.Millisecond, Steps: 10, Jitter: 0.2} | ||
| err := wait.ExponentialBackoffWithContext(ctx, backoff, func(ctx context.Context) (done bool, err error) { |
There was a problem hiding this comment.
The backoff configuration (10 steps of 100ms) sums up to the 1-second context timeout. Due to API call latency, the later steps will likely be cut off by the context expiration. If the timeout is increased, consider also adjusting the backoff duration or steps to ensure they fit within the new timeout window.
| "namespace", runtime.GetNamespace()) | ||
| if err != nil { | ||
| if wait.Interrupted(err) { | ||
| return errors.Wrapf(err, "timeout waiting for PVC %s to be deleted after 1-second retry", runtime.GetName()) |
There was a problem hiding this comment.
The error message should reflect the actual timeout duration used in the context.
| return errors.Wrapf(err, "timeout waiting for PVC %s to be deleted after 1-second retry", runtime.GetName()) | |
| return errors.Wrapf(err, "timeout waiting for PVC %s to be deleted after 5-second retry", runtime.GetName()) |
| if !ok { | ||
| return false | ||
| } | ||
| if len(pvc.Annotations) > 0 && pvc.Annotations["CreatedBy"] == "fluid" { |
There was a problem hiding this comment.
| return []reconcile.Request{} | ||
| } | ||
|
|
||
| if len(pv.Annotations) > 0 && pv.Annotations["CreatedBy"] == "fluid" { |
| nsPods := corev1.PodList{} | ||
| err := e.List(context.TODO(), &nsPods, &client.ListOptions{ | ||
| Namespace: namespace, | ||
| UnsafeDisableDeepCopy: ptr.To(true), |
There was a problem hiding this comment.
Using UnsafeDisableDeepCopy in a public utility function like GetPvcMountPods is risky. Since the returned objects point directly to the cache's internal state, any caller that inadvertently modifies a Pod object will corrupt the cache. While the current usage in this PR is safe, this optimization should be documented clearly to warn future users of this utility.
There was a problem hiding this comment.
Pull request overview
This PR optimizes Fluid’s dataset/runtime deletion flow for high-rate delete scenarios by avoiding expensive pod listing in common cases, reducing list deep-copy overhead, and adding event-driven requeues tied to volume lifecycle events.
Changes:
- Disable two legacy deletion mechanisms by default (in-use dataset delete protection; forced PVC finalizer cleanup), with opt-in env flags for backward compatibility.
- Reduce memory usage during pod listing by enabling
UnsafeDisableDeepCopyinGetPvcMountPods. - Add PV/PVC delete watches to trigger runtime reconciles when volumes disappear, and remove the now-unused
GetPvcMountNodeshelper and related tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/kubeclient/volume_test.go | Removes unit tests for GetPvcMountNodes (function removed). |
| pkg/utils/kubeclient/volume_claim_test.go | Removes additional GetPvcMountNodes unit tests. |
| pkg/utils/kubeclient/volume.go | Enables UnsafeDisableDeepCopy for pod listing; removes GetPvcMountNodes. |
| pkg/utils/dataset/volume/delete.go | Reworks PVC deletion wait loop, gates forced finalizer removal behind legacy env var. |
| pkg/ctrl/watch/manager.go | Adds PVC/PV delete watches to enqueue runtime reconciliations when volumes are deleted. |
| pkg/controllers/v1alpha1/dataset/dataset_controller.go | Gates “block in-use dataset deletion” behind legacy env var (disabled by default). |
| pkg/common/env_names.go | Adds env var names for legacy mechanism toggles. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // WARN: This is a LEGACY MECHANISM and will be removed in the future. | ||
| // force deletion of pvc-protection finalizer will not be done, we'll wait until the pvc is really deleted by the PV controller. | ||
| if utils.GetBoolValueFromEnv(common.LegacyEnvForceCleanUpManagedPVC, false) { | ||
| should, err := kubeclient.ShouldRemoveProtectionFinalizer(client, runtime.GetName(), runtime.GetNamespace()) | ||
| if err != nil { | ||
| // ignore NotFound error and re-check existence if the pvc is already deleted | ||
| if utils.IgnoreNotFound(err) == nil { | ||
| continue | ||
| return false, nil | ||
| } | ||
| } | ||
|
|
||
| if should { | ||
| log.Info("Should forcibly remove pvc-protection finalizer") | ||
| err = kubeclient.RemoveProtectionFinalizer(client, runtime.GetName(), runtime.GetNamespace()) | ||
| if err != nil { | ||
| // ignore NotFound error and re-check existence if the pvc is already deleted | ||
| if utils.IgnoreNotFound(err) == nil { | ||
| return false, nil | ||
| } | ||
| log.Info("Failed to remove finalizers", "name", runtime.GetName(), "namespace", runtime.GetNamespace()) | ||
| return false, err | ||
| } | ||
| log.Info("Failed to remove finalizers", "name", runtime.GetName(), "namespace", runtime.GetNamespace()) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
The PVC finalizer force-cleanup path is now gated behind LEGACY_MECHANISM_FORCE_CLEAN_UP_MANAGED_PVC, which changes the behavior of existing unit tests that assume the pvc-protection finalizer will be removed automatically. For example, pkg/utils/dataset/volume/delete_test.go has scenarios like "PVC is stuck terminating with pvc-protection finalizer" that will now time out unless the env var is set. Please update the tests (set/unset the env var per scenario and adjust assertions) to match the new default behavior.
| if err != nil { | ||
| if wait.Interrupted(err) { | ||
| return errors.Wrapf(err, "timeout waiting for PVC %s to be deleted after 1-second retry", runtime.GetName()) | ||
| } |
There was a problem hiding this comment.
The timeout error message loses important context and no longer includes the PVC namespace (the earlier error messages in this file include both name and namespace). Consider including both name and namespace, and avoid hardcoding "1-second" in the message unless it exactly matches the configured timeout/backoff values.
| DeleteFunc: func(de event.DeleteEvent) bool { | ||
| pvc, ok := de.Object.(*corev1.PersistentVolumeClaim) | ||
| if !ok { | ||
| return false | ||
| } | ||
| if len(pvc.Annotations) > 0 && pvc.Annotations["CreatedBy"] == "fluid" { | ||
| return true | ||
| } | ||
| return false |
There was a problem hiding this comment.
The annotation filter is hardcoded as pvc.Annotations["CreatedBy"] == "fluid". Since the codebase already centralizes this in common.GetExpectedFluidAnnotations(), it’s easy for these literals to drift from the canonical key/value. Please reuse the common helper/constant here (and in the PV watch below) to keep the filter consistent across the project.
| err = c.Watch(source.Kind(mgr.GetCache(), &corev1.PersistentVolume{}), handler.EnqueueRequestsFromMapFunc(pvToRuntimeReqFn), predicate.Funcs{ | ||
| CreateFunc: func(ce event.CreateEvent) bool { return false }, | ||
| UpdateFunc: func(ue event.UpdateEvent) bool { return false }, | ||
| DeleteFunc: func(de event.DeleteEvent) bool { return true }, | ||
| }) |
There was a problem hiding this comment.
DeleteFunc for PersistentVolume events currently returns true for all PV deletions, even though the mapping function only enqueues requests for Fluid-managed PVs. This causes the controller to process every PV delete event cluster-wide. Consider filtering in the predicate as well (e.g., only pass events where the PV has Fluid annotations / a non-nil ClaimRef) to reduce unnecessary work under high PV churn.
| EnvFuseSidecarInjectionMode = "FUSE_SIDECAR_INJECTION_MODE" | ||
| ) | ||
|
|
||
| // Env names that related to legacy mechanisms for backward compatibility |
There was a problem hiding this comment.
Grammar: "Env names that related to legacy mechanisms" should be "Env names that are related to legacy mechanisms".
| // Env names that related to legacy mechanisms for backward compatibility | |
| // Env names that are related to legacy mechanisms for backward compatibility |



Ⅰ. Describe what this PR does
This PR optimize deletion logic for datasets and runtimes especially for handling high-rate deletion requests. Pod listing may consume large amount of CPU and memory resources and it's not necessary in most cases. Specifically, the pr does the following modifications:
UnsafeDisableDeepCopyto lower memory usage.NOTE: This PR removes two mechanisms inside Fluid's controllers:
The two mechanisms will be DISABLED by default since this PR and may be REMOVED in feature. The two mechanism could have little effect on Fluid's users because
(1) For "in-use dataset delete protection", PVC itself has a protection, so it would be enough for safety check.
(2) For "forcibly clean up Dataset PVCs", it's not a good practice to forcibly clean resources. It stucks for a reason, Fluid controllers will report "fail to delete volume" and guides users to found what is stucking.
Backward Compatibility
LEGACY_MECHANISM_BLOCK_IN_USE_DATASET_DELETION=trueLEGACY_MECHANISM_FORCE_CLEAN_UP_MANAGED_PVC=trueⅡ. Does this pull request fix one issue?
fixes #XXXX
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews