Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: verify quota according to resource dimensions required by pod #1753

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

qinfustu
Copy link
Contributor

Ⅰ. Describe what this PR does

fix the resources required by the pod in the quota are sufficient but the pod cannot be scheduled.

Ⅱ. Does this pull request fix one issue?

The resources required by the pod in the quota are sufficient but the pod cannot be scheduled.
e.q.
quota info

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
  name: quota-sub
  namespace: default
  annotations:
    quota.scheduling.koordinator.sh/runtime: '{"cpu":"40","memory":"40Gi", "nvidia.com/gpu": "6"}'
  labels:
    quota.scheduling.koordinator.sh/parent: "quota-parent"
    quota.scheduling.koordinator.sh/is-parent: "false"
spec:
  max:
    cpu: 40
    memory: 40Gi
    nvidia.com/gpu: 8
  min:
    cpu: 40
    memory: 40Gi
    nvidia.com/gpu: 8
status:
  used:
    cpu: 10
    memory: 20Gi
    nvidia.com/gpu: 8

pod info

apiVersion: v1
kind: Pod
metadata:
  name: pod-example
  namespace: default
  labels:
    quota.scheduling.koordinator.sh/name: "quota-sub"
spec:
  schedulerName: koord-scheduler
  containers:
  - command:
    - sleep
    - 365d
    image: busybox
    imagePullPolicy: IfNotPresent
    name: curlimage
    resources:
      limits:
        cpu: 2
        memory: 4Gi
      requests:
        cpu: 2
        memory: 4Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  restartPolicy: Always

At this time, the pod will not be scheduled due to insufficient quota resources(nvidia.com/gpu: runtime < used). In fact, the cpu and memory required by the pod are sufficient.
There are scenarios where runtime < used appears:

  • Enable the borrowing / returning of resources between different quota. But disable "used > runtime revoke" logic.
  • Operators adjust quota resource distribution.

App version: 1.20
Kubernetes version (use kubectl version): 1.19

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (3784df1) 66.11% compared to head (4fcc3a1) 66.10%.
Report is 7 commits behind head on main.

Files Patch % Lines
pkg/scheduler/plugins/elasticquota/preempt.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1753      +/-   ##
==========================================
- Coverage   66.11%   66.10%   -0.02%     
==========================================
  Files         388      388              
  Lines       42425    42475      +50     
==========================================
+ Hits        28048    28076      +28     
- Misses      12305    12324      +19     
- Partials     2072     2075       +3     
Flag Coverage Δ
unittests 66.10% <83.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eahydra eahydra changed the title scheduler: fix the resources required by the pod in the quota are sufficient but the pod cannot be scheduled scheduler: verify quota according to resource dimensions required by pod Nov 23, 2023
@koordinator-bot koordinator-bot bot added size/S and removed size/L labels Nov 25, 2023
@eahydra
Copy link
Member

eahydra commented Nov 27, 2023

@qinfustu This PR has been very clear. Nice job! And it is recommended to add some specific UT cases. This scenario is indeed easily overlooked.

@qinfustu
Copy link
Contributor Author

I am currently on paternity leave. Please help me follow up on PR merger, @eahydra. Thx~
Besides,I will commit the UT cases over the leave.

@eahydra
Copy link
Member

eahydra commented Nov 30, 2023

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 184c1d0 into koordinator-sh:main Nov 30, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants