OADP-7541: fix non-deterministic matchExpressions ordering causing node-agent restarts#2234
Conversation
…de-agent restarts When a DPA configures nodeAgent.podConfig.nodeSelector with multiple labels, kube.ToSystemAffinity() iterates Go maps in random order to build matchExpressions. Kubernetes treats arrays as ordered, so each order flip registers as a spec change, triggering DaemonSet rollouts every reconciliation (~30s). Customer observed generation 13858 on node-agent while velero deployment stayed at 2. Sort matchExpressions by key after building them from LoadAffinityConfig to ensure deterministic ordering. Applied to both node-agent DaemonSet and Velero Deployment code paths. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
@kaovilai: This pull request references OADP-7541 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughSort NodeSelectorTerm MatchExpressions/MatchFields by requirement key via new common.SortNodeSelectorTerms; apply the sorter in NodeAgent and Velero affinity assembly and add tests asserting deterministic ordering. ChangesDeterministic node affinity ordering
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick oadp-1.6 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick oadp-1.5 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Sort expected matchExpressions alphabetically by key to match the deterministic ordering introduced in the previous commit. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
/retest-required Note Responses generated with Claude
|
| terms = append(terms, a.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms...) | ||
| } | ||
| } | ||
| common.SortNodeSelectorTerms(terms) |
There was a problem hiding this comment.
Have you considered filing an upstream Velero issue (or PR) to sort there? That would fix it for all consumers, not just OADP. This workaround is fine to merge now, but an upstream fix would let us eventually drop it.
| // Go map iteration randomness in upstream kube.ToSystemAffinity | ||
| // produces non-deterministic matchExpressions order, which Kubernetes | ||
| // treats as a spec change and triggers unnecessary DaemonSet rollouts. | ||
| func SortNodeSelectorTerms(terms []corev1.NodeSelectorTerm) { |
There was a problem hiding this comment.
SortNodeSelectorTerms only sorts MatchExpressions but the name suggests it handles the full NodeSelectorTerm. Two options:
- Also sort MatchFields for completeness (same pattern, one extra loop).
- Rename to something like SortMatchExpressions to match what it actually does.
I'd lean toward option 1 since it's trivial and future-proofs us if MatchFields ever comes into play.
There was a problem hiding this comment.
Good call — added MatchFields sorting too in 516587c. Extracted the comparator to avoid duplication.
Note
Responses generated with Claude
Address review feedback: sort both MatchExpressions and MatchFields by key for completeness and future-proofing. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
@kaovilai: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
nodeAgent.podConfig.nodeSelectorwith multiple labels,kube.ToSystemAffinity()iterates Go maps in random order to buildmatchExpressions. Kubernetes treats arrays as ordered, so each order flip registers as a spec change, triggering DaemonSet rollouts every reconciliation (~30s). Customer observed generation 13858 on node-agent while velero deployment stayed at 2.common.SortNodeSelectorTerms()helper that sortsMatchExpressionsby key within eachNodeSelectorTermTest plan
TestNodeAgentAffinityMatchExpressionsDeterministicOrder(3 sub-cases × 100 iterations)TestVeleroAffinityMatchExpressionsDeterministicOrder(100 iterations)-count=5for additional confidence🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests