Skip to content

Add uninqueueable reason in podgroup condition#3045

Open
lowang-bh wants to merge 2 commits into
volcano-sh:masterfrom
lowang-bh:add_uninqueue_state
Open

Add uninqueueable reason in podgroup condition#3045
lowang-bh wants to merge 2 commits into
volcano-sh:masterfrom
lowang-bh:add_uninqueue_state

Conversation

@lowang-bh

@lowang-bh lowang-bh commented Aug 12, 2023

Copy link
Copy Markdown
Member

Please merge API's PR volcano-sh/apis#113 first, and then I need to update the go.mod and refresh the last commit.

Add un-inqueueable reson in podgroup condition if job is rejected to be enqueue, so that it is more clear when describe podgroup to see why job is pending.

This PR is about to change podgroup's pending condition caused by not enough queue's quota from:

    message: '3/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailable'
    reason: NotEnoughResources

to

    message:  '3/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailabl, origin reason: queue resource quota insufficient'
    reason: NotInqueueable

test result

origin

spec:
  minMember: 3
  minResources:
    count/pods: "3"
    cpu: 1100m
    memory: 200Mi
    pods: "3"
    requests.cpu: 1100m
    requests.memory: 200Mi
  minTaskMember:
    master: 2
    work: 1
  queue: test
status:
  conditions:
  - lastTransitionTime: "2023-08-12T06:27:12Z"
    message: '3/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailable'
    reason: NotEnoughResources
    status: "True"
    transitionID: 35ec245e-f901-47a9-a2b1-ef0456505f86
    type: Unschedulable
  phase: Pending
➜  volcano git:(add_uninqueue_state) ✗ kubectl get events |grep minavailable-job-4d40d46d-8bab-4914-9ddc-7a8e2aeda95a
18s         Normal    Unschedulable             podgroup/minavailable-job-4d40d46d-8bab-4914-9ddc-7a8e2aeda95a   queue resource quota insufficient
19s         Warning   Unschedulable             podgroup/minavailable-job-4d40d46d-8bab-4914-9ddc-7a8e2aeda95a   0/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailable

with this change

➜  volcano git:(add_uninqueue_state) ✗ # image with add_uninqueue_state-67bf4ad9a
➜  volcano git:(add_uninqueue_state) ✗ kubectl get pod -n volcano-system -l app=volcano-scheduler
NAME                                 READY   STATUS    RESTARTS   AGE
volcano-scheduler-5bc9875dbb-sjvvn   1/1     Running   0          3m35s
➜  volcano git:(add_uninqueue_state) ✗ kubectl get deployments.apps -n volcano-system volcano-scheduler -o yaml |grep "image:"
        image: volcanosh/vc-scheduler:add_uninqueue_state-67bf4ad9a
➜  volcano git:(add_uninqueue_state) ✗ kubectl get podgroups.scheduling.volcano.
NAME                                                    STATUS    MINMEMBER   RUNNINGS   AGE
minavailable-job-9374cf52-55a0-4fc5-bb2c-effacd5703d8   Pending   3                      69s
➜  volcano git:(add_uninqueue_state) ✗ kubectl get events |grep minavailable-job-9374cf52-55a0-4fc5-bb2c-effacd5703d8
73s         Normal    Uninqueueable     podgroup/minavailable-job-9374cf52-55a0-4fc5-bb2c-effacd5703d8   queue resource quota insufficient
74s         Warning   Unschedulable     podgroup/minavailable-job-9374cf52-55a0-4fc5-bb2c-effacd5703d8   0/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailable


# yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2023-08-12T07:29:26Z"
  generation: 3
  name: minavailable-job-9374cf52-55a0-4fc5-bb2c-effacd5703d8
  namespace: default
  ownerReferences:
  - apiVersion: batch.volcano.sh/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: minavailable-job
    uid: 9374cf52-55a0-4fc5-bb2c-effacd5703d8
  resourceVersion: "263313"
  uid: 750510bb-48a0-4cab-8d78-4e466c47b928
spec:
  minMember: 3
  minResources:
    count/pods: "3"
    cpu: 1100m
    memory: 200Mi
    pods: "3"
    requests.cpu: 1100m
    requests.memory: 200Mi
  minTaskMember:
    master: 2
    work: 1
  queue: test
status:
  conditions:
  - lastTransitionTime: "2023-08-12T07:30:32Z"
    message: queue resource quota insufficient
    reason: NotInqueueable
    status: "True"
    transitionID: 80bfabf1-df94-46cf-8d62-546080a83ae3
    type: Unschedulable
  phase: Pending

@volcano-sh-bot volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 12, 2023
@lowang-bh

Copy link
Copy Markdown
Member Author

test result with job can not enqueue
image

@lowang-bh

Copy link
Copy Markdown
Member Author

/assign @wangyang0616 @hwdef @william-wang @Thor-wl

Comment thread go.mod Outdated
Comment thread go.sum Outdated
@lowang-bh

lowang-bh commented Aug 13, 2023

Copy link
Copy Markdown
Member Author

And another change is to append the origin error to msg so that both gang-unschedule info and origin reason displayed.

image image

@lowang-bh lowang-bh force-pushed the add_uninqueue_state branch from 300c1ea to 6467b11 Compare August 13, 2023 04:25
@hwdef

hwdef commented Aug 13, 2023

Copy link
Copy Markdown
Member

I think the pr is well intended, but I have two suggestions:

  1. this pr needs documentation
  2. as far as the current code is concerned, the hints are still too simple, we need hints similar to, there are xx nodes with insufficient cpu, xx nodes with insufficient gpu, xx nodes with unsatisfied memory, xx nodes with unsatisfied affinity

@lowang-bh

lowang-bh commented Aug 13, 2023

Copy link
Copy Markdown
Member Author

there are xx nodes with insufficient cpu, xx nodes with insufficient gpu, xx nodes with unsatisfied memory, xx nodes with unsatisfied affinity

I remember those infor existed in the past version, but some subsequent prs covered those code, and now those info missed.

the hints are still too simple

This pr just add un-enqueueable reson which does not include all cases. I know what you want, eg: issue #2993. It is better to do that in another PR.

@hwdef

hwdef commented Aug 13, 2023

Copy link
Copy Markdown
Member

This pr just add un-enqueueable reson which does not include all cases. I know what you want, eg: issue #2993. It is better to do that in another PR.

ok, I know, but I still want the docs. Because I do not know why we need this status.

@lowang-bh lowang-bh changed the title Add uninqueue state Add uninqueue reason in podgroup condition Aug 14, 2023
@lowang-bh lowang-bh changed the title Add uninqueue reason in podgroup condition Add uninqueueable reason in podgroup condition Aug 14, 2023
@lowang-bh

Copy link
Copy Markdown
Member Author

ok, I know, but I still want the docs. Because I do not know why we need this status.

Yes, I will add it later.

@lowang-bh

Copy link
Copy Markdown
Member Author

Hi, @william-wang , can we push this improvement ahead by merge https://github.com/volcano-sh/apis/pull/113?

@william-wang

Copy link
Copy Markdown
Member

We need to refine following message to let user know the enqueue phase and the detail reason. @Monokaix
"message: '3/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailabl, origin reason: queue resource quota insufficient'
reason: NotInqueueable"

@lowang-bh

Copy link
Copy Markdown
Member Author

/assign @Monokaix

@hwdef hwdef removed their assignment Jan 25, 2024
@lowang-bh

Copy link
Copy Markdown
Member Author

@Monokaix Could we release this in v1.9.0?

@hwdef

hwdef commented Mar 27, 2024

Copy link
Copy Markdown
Member

I think this is important

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2024
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2024
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2024
@hwdef

hwdef commented Oct 28, 2024

Copy link
Copy Markdown
Member

please squash your commits

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 1, 2024
@lowang-bh lowang-bh force-pushed the add_uninqueue_state branch 2 times, most recently from f485978 to ed40f66 Compare November 2, 2024 02:01
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 2, 2024
@lowang-bh

Copy link
Copy Markdown
Member Author

please squash your commits

Last commit is a temp commit for api import. After volcano-sh/apis#113 merged, it will be removed

@lowang-bh lowang-bh closed this Nov 2, 2024
@lowang-bh lowang-bh reopened this Nov 2, 2024
@lowang-bh lowang-bh closed this Nov 2, 2024
@lowang-bh lowang-bh reopened this Nov 2, 2024
@lowang-bh lowang-bh closed this Nov 16, 2024
@lowang-bh lowang-bh reopened this Nov 16, 2024
@hwdef

hwdef commented Nov 18, 2024

Copy link
Copy Markdown
Member

Please rebase the master code.

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 20, 2024
Signed-off-by: lowang-bh <lhui_wang@163.com>

add enqueueable test case for un-enqueueable reason

Signed-off-by: lowang-bh <lhui_wang@163.com>

store
Signed-off-by: lowang-bh <lhui_wang@163.com>
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2024
@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2025
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

@lowang-bh: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions

Copy link
Copy Markdown

Hello 👋 Looks like there was no activity on this amazing PR for last 180 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 90 days, this issue will be closed (we can always reopen a PR if you get back to this!).

@github-actions github-actions Bot added the Stale label Sep 20, 2025
@github-actions

Copy link
Copy Markdown

Closing for now as there was no activity for last 90 days after marked as stale, let us know if you need this to be reopened! 🤗

@github-actions github-actions Bot closed this Dec 19, 2025
@lowang-bh

Copy link
Copy Markdown
Member Author

/re-open

@lowang-bh

Copy link
Copy Markdown
Member Author

/reopen

@volcano-sh-bot

Copy link
Copy Markdown
Contributor

@lowang-bh: Reopened this PR.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hwdef. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants