diff --git a/keps/976-plain-pods/README.md b/keps/976-plain-pods/README.md index 24f8821b68..2af74af471 100644 --- a/keps/976-plain-pods/README.md +++ b/keps/976-plain-pods/README.md @@ -24,6 +24,7 @@ - [Single Pods](#single-pods) - [Groups of Pods created beforehand](#groups-of-pods-created-beforehand) - [Groups of pods where driver generates workers](#groups-of-pods-where-driver-generates-workers) + - [Serving Workload](#serving-workload) - [Tracking admitted and finished Pods](#tracking-admitted-and-finished-pods) - [Retrying Failed Pods](#retrying-failed-pods) - [Dynamically reclaiming Quota](#dynamically-reclaiming-quota) @@ -551,6 +552,24 @@ spec: cpu: 1m ``` +#### Serving Workload + +1. The Pod Group integration adds successfully completed pods to the `ReclaimablePods` list. However, + this is problematic for serving workloads, such as `StatefulSet`, because it prevents to ungate the replacement + pod. This behavior is incorrect, as recreated pods for serving workloads should continue to run + regardless if the pod was failed or succeeded. + To resolve this issue, the `kueue.x-k8s.io/pod-group-serving` annotation can be used. When this + annotation is set to true, the `ReclaimablePods` mechanism no longer tracks the number of + pods, allowing to ungate the replacement pod. +2. The Pod Group integration waits until all pods are created. However, for serving workloads such + as `StatefulSets` with a `PodManagementPolicyType` of `OrderedReady`, pods are created sequentially, + with each subsequent pod being created only after the previous pod is fully running. This + sequential behavior can result in a deadlock. + To resolve this issue, the `kueue.x-k8s.io/pod-group-fast-admission` annotation is used. + When this annotation is set to true, the PodGroup can proceed with admission without requiring + all pods to reach the ungated state. + + ### Tracking admitted and finished Pods Pods need to have finalizers so that we can reliably track how many of them run to completion and be