Description
Enhancement Description
Administrators often taint nodes with high-value resources like GPUs, to avoid them being consumed by workloads that do not need them. To simplify the user experience, some platforms (e.g., GKE) run a webhook to automatically tolerate those taints, if the pods have extended resource requests for those resources. This ensures that pods still run even if the user forgets to add the toleration, but only for those pods that actually need it.
With the advent of DRA, the exact needs of the workload are no longer determinable simply by looking at the PodSpec during API admission. Instead, the resource claims and device classes must also be examined. Additionally, the optionality available in DRA resource claim APIs may mean that several different types of nodes/resources (and therefore several different types of tolerations) are needed. A webhook does not have access to all the information it would need to add the tolerations at API admission time.
We discussed adding a "high value resource" aspect to node capabilities, but after further discussion it's not clear that's the right way to solve this problem. This enhancement request provides an alternative approach.
In this approach, we create a new scheduler plugin (or update the existing taints & tolerations plugin), which can be configured to examine the PodSpec and all associated Resource Claims and DeviceClasses at scheduling time and, based on the needs of the workload, implicitly tolerate taints. Essentially, we move the behavior of the web hook from API server admission time, to Pod scheduling time. This allows all necessary information to be available.
The specific way to calculate the tolerations, and the taints which they will tolerate will likely need to be part of the configuration of the scheduler plugin, since it is not known upstream what those taints are and when/how they should be tolerated.
This approach requires no new user-facing APIs, and enables Pods that must run on tainted nodes, but do not actually need the specialized device (like management pods) to be configured with the appropriate tolerations, explicitly.
/cc @pohly @klueska @pravk03 @dom4ha @dchen1107
/sig scheduling
/wg device-management
- One-line enhancement description (can be used as a release note): Enable configuration of the scheduler to implicitly tolerate taints based on data found in the PodSpec, Resource Claims, and Device Classes
- Kubernetes Enhancement Proposal: TBD
- Discussion Link:
- Primary contact (assignee): @johnbelamaric
- Responsible SIGs: Scheduling
- Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.34
- Beta release target (x.y):
- Stable release target (x.y):
- Alpha
- KEP (
k/enhancements
) update PR(s): [KEP-5282]Add KEP for Implicit Tolerations #5389 - Code (
k/k
) update PR(s): - Docs (
k/website
) update PR(s):
- KEP (
Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status
Status