You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you expect to see?
If the node does not possess the resources specified in the configuration profile thresholds, then it will not be considered as a underutilized node and thus will not be subject to eviction.
In a cluster where multiple types of GPU nodes exist, if a certain GPU type is not included in the threshold settings of the configuration file, the node hosting that GPU will be deemed as underutilized node and will be evicted. This occurs even if the GPU allocation rate on that node is at 100%, which is evidently unreasonable.
What did you see instead?
In the cluster, a node that did not contain "cloudml.gpu/v100-32g" resources was mistakenly identified as a underutilized node and was consequently evicted.
I encountered the same problem. After I expanded a new GPU type machine in the cluster, I tried to schedule the pod to this machine, but it was evicted by the HighNodeUtilization policy. It feels very risky to use this policy in the production environment.
What version of descheduler are you using?
descheduler version: v0.28.0
Does this issue reproduce with the latest release?
Yes
Which descheduler CLI options are you using?
Please provide a copy of your descheduler policy config file
What k8s version are you using (
kubectl version
)?kubectl version
OutputWhat did you do?
What did you expect to see?
If the node does not possess the resources specified in the configuration profile thresholds, then it will not be considered as a underutilized node and thus will not be subject to eviction.
In a cluster where multiple types of GPU nodes exist, if a certain GPU type is not included in the threshold settings of the configuration file, the node hosting that GPU will be deemed as underutilized node and will be evicted. This occurs even if the GPU allocation rate on that node is at 100%, which is evidently unreasonable.
What did you see instead?
In the cluster, a node that did not contain "cloudml.gpu/v100-32g" resources was mistakenly identified as a underutilized node and was consequently evicted.
2025-02-20T12:06:51.894357974Z I0220 12:06:51.894295 1 nodeutilization.go:198] "Node is underutilized" node="test", usage=map[cloudml.gpu/v100-32g:0 cpu:13777m memory:19106058Ki pods:33] usagePercentage=map[cpu:10.76328125 memory:1.8098670918577757 pods:12.992125984251969]
The text was updated successfully, but these errors were encountered: