HighNodeUtilization will evict nodes that do not have resource types specified in the configuration profile thresholds. #1634

JBinin · 2025-02-20T13:31:22Z

What version of descheduler are you using?

descheduler version: v0.28.0

Does this issue reproduce with the latest release?

Yes

Which descheduler CLI options are you using?

Please provide a copy of your descheduler policy config file

  - name: HighNodeUtilization
     args:
       thresholds:
         "cloudml.gpu/v100-32g": 40

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version

What did you do?

What did you expect to see?
If the node does not possess the resources specified in the configuration profile thresholds, then it will not be considered as a underutilized node and thus will not be subject to eviction.

In a cluster where multiple types of GPU nodes exist, if a certain GPU type is not included in the threshold settings of the configuration file, the node hosting that GPU will be deemed as underutilized node and will be evicted. This occurs even if the GPU allocation rate on that node is at 100%, which is evidently unreasonable.

What did you see instead?
In the cluster, a node that did not contain "cloudml.gpu/v100-32g" resources was mistakenly identified as a underutilized node and was consequently evicted.

2025-02-20T12:06:51.894357974Z I0220 12:06:51.894295 1 nodeutilization.go:198] "Node is underutilized" node="test", usage=map[cloudml.gpu/v100-32g:0 cpu:13777m memory:19106058Ki pods:33] usagePercentage=map[cpu:10.76328125 memory:1.8098670918577757 pods:12.992125984251969]

The text was updated successfully, but these errors were encountered:

LY-today · 2025-02-21T03:17:03Z

I encountered the same problem. After I expanded a new GPU type machine in the cluster, I tried to schedule the pod to this machine, but it was evicted by the HighNodeUtilization policy. It feels very risky to use this policy in the production environment.

googs1025 · 2025-02-22T03:28:51Z

/cc

JBinin added the kind/bug Categorizes issue or PR as related to a bug. label Feb 20, 2025

googs1025 mentioned this issue Feb 24, 2025

fix: HighNodeUtilization plugin incorrectly evicts all pods when only GPU resources are configured #1635

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HighNodeUtilization will evict nodes that do not have resource types specified in the configuration profile thresholds. #1634

HighNodeUtilization will evict nodes that do not have resource types specified in the configuration profile thresholds. #1634

JBinin commented Feb 20, 2025 •

edited

Loading

LY-today commented Feb 21, 2025

googs1025 commented Feb 22, 2025

HighNodeUtilization will evict nodes that do not have resource types specified in the configuration profile thresholds. #1634

HighNodeUtilization will evict nodes that do not have resource types specified in the configuration profile thresholds. #1634

Comments

JBinin commented Feb 20, 2025 • edited Loading

LY-today commented Feb 21, 2025

googs1025 commented Feb 22, 2025

JBinin commented Feb 20, 2025 •

edited

Loading