feat: add queue-level scheduling policy#4380
Conversation
|
/cc |
|
/cc |
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for queue-level scheduling policies by introducing a SchedulingPolicy construct that allows configuring actions and plugins per queue. Key changes include:
- Extending the scheduler to load multiple configuration files (global vs. per-queue), storing them in
schedulingPolicies, and passing them into each session. - Enhancing
Sessionandframework.OpenSessionto register policy-specific plugins/actions, and updating action execution loops to respect per-job or global policy. - Adding utility methods (
UnionActions,getSchedulingPolicyConf, feature-gated checks) and updating tests to account for the new signature ofOpenSession.
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/scheduler/scheduler.go | Load per-queue config files, manage schedulingPolicies, pass into sessions, implement UnionActions. |
| pkg/scheduler/framework/session.go | Add SchedulingPolicy and PluginRegistry to Session, implement getters and policy-based HasAction. |
| pkg/scheduler/framework/framework.go | Update OpenSession/CloseSession to initialize/teardown policy plugins. |
| pkg/scheduler/framework/util.go | Add helpers to copy and clear session functions into policies. |
| pkg/features/volcano_features.go | Introduce SchedulingPolicy feature gate. |
| cmd/scheduler/app/options/options.go | Change --scheduler-conf to accept a directory. |
Various *_test.go |
Updated OpenSession calls to new signature. |
Comments suppressed due to low confidence (3)
pkg/scheduler/scheduler.go:286
- [nitpick] Consider adding unit tests for
UnionActions, verifying that global and policy-specific actions merge correctly under various feature-gate and policy scenarios.
func (pc *Scheduler) UnionActions() []framework.Action {
pkg/scheduler/scheduler.go:222
- [nitpick] Add a doc comment explaining the purpose and return values of
getSchedulingPolicyConfto clarify what the two maps represent.
func (pc *Scheduler) getSchedulingPolicyConf() (map[string][]string, map[string][]string) {
pkg/scheduler/scheduler.go:53
- [nitpick] Rename
PolicyMutextoschedulingPoliciesMutex(or similar) to clearly associate it with theschedulingPoliciesfield.
PolicyMutex sync.Mutex
| schedulingPolicies := pc.schedulingPolicies | ||
| pc.mutex.Unlock() | ||
|
|
There was a problem hiding this comment.
Access to pc.schedulingPolicies here is not protected by PolicyMutex, leading to a potential race. Consider locking PolicyMutex around reads (and writes) of schedulingPolicies.
| schedulingPolicies := pc.schedulingPolicies | |
| pc.mutex.Unlock() | |
| pc.mutex.Unlock() | |
| pc.PolicyMutex.Lock() | |
| schedulingPolicies := pc.schedulingPolicies | |
| pc.PolicyMutex.Unlock() |
| func (pc *Scheduler) getSchedulingPolicyConf() (map[string][]string, map[string][]string) { | ||
| policyActions := make(map[string][]string) | ||
| policyPlugins := make(map[string][]string) | ||
|
|
There was a problem hiding this comment.
Iterating over pc.schedulingPolicies without holding PolicyMutex can race with concurrent updates. Acquire the mutex before looping through the map.
| pc.PolicyMutex.Lock() | |
| defer pc.PolicyMutex.Unlock() |
80e19dd to
6e039f8
Compare
5889ace to
da8eb06
Compare
|
Made the following changes based on the weekly meeting discussion:
Additionally, it was found that using DefaultSchedulingPolicy conflicts with the extension points in ssn, therefore, the global configuration is still kept in ssn. |
|
Great work! |
5f7053d to
e0b957b
Compare
|
/assign @JesseStutler |
|
Please rebase master and solve code conflict. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Yuqi Wu <wuyuqi22@mails.ucas.ac.cn>
c3400b3 to
da4c25f
Compare
| } | ||
|
|
||
| // GetSchedulingPolicyFromJob return the SchedulingPolicy from the job. | ||
| func GetSchedulingPolicyFromJob(job *api.JobInfo) *SchedulingPolicy { |
There was a problem hiding this comment.
Given the mutable nature of SchedulingPolicy, GetSchedulingPolicyFromTask must retrieve the most up-to-date policy from the Job's associated queue on each call. An alternative is to refresh job.SchedulingPolicy with the current Queue.SchedulingPolicy when OpenSession is invoked.
The same to GetSchedulingPolicyFromQueue and GetSchedulingPolicyFromTask.
If we opt to reset xxx.SchedulingPolicy during OpenSession, it's crucial to set queue.SchedulingPolicy first, followed by jobInfo.SchedulingPolicy and then taskInfo.SchedulingPolicy.
What's your take on this? @JesseStutler .
|
@ElectricFish7: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Hi @DavidYQWu , Thanks for your work, could u continue finish this ? |
kingeasternsun
left a comment
There was a problem hiding this comment.
Thanks for your work.
I’ll address the conflicts and add test files, but could you take a look at the implementation in this PR first and let me know if you notice any issues? I can fix them together with you. |
What type of PR is this?
Add queue-level scheduling policies
What this PR does / why we need it:
This PR introduces queue-level scheduling policy. As outlined in the implementation section of the design document(#4163), it enables configuring different Actions and Plugins per queue using the SchedulingPolicy. This design also allows for straightforward extension to job-level policies in the future.
Note:
Which issue(s) this PR fixes:
Fixes ##3992
Special notes for your reviewer:
@Monokaix
@JesseStutler
Does this PR introduce a user-facing change?