@@ -348,24 +348,24 @@ list, it will not add a reference to the pod.
348
348
349
349
##### Deallocation
350
350
The resourceclaim controller will remove Pod references from the ` ReservedFor ` list just
351
- like it does now using the same logic. For non-Pod references, the controller will recognize
352
- a small number of built-in types, starting with ` Deployment ` , ` StatefulSet ` and ` Job ` , and will
353
- remove the reference from the list when those resources are removed. For other types,
354
- it will be the responsibility of the workload controller/user that created the ` ResourceClaim `
355
- to remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
351
+ like it does now using the same logic. But for non-Pod references, it
352
+ will be the responsibility of the controller/user that created the ` ResourceClaim ` to
353
+ remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
356
354
are consuming the ` ResourceClaim ` and no new pods will be created that references
357
355
the ` ResourceClaim ` .
358
356
359
357
The resourceclaim controller will then discover that the ` ReservedFor ` list is empty
360
358
and therefore know that it is safe to deallocate the ` ResourceClaim ` .
361
359
362
- This requires that the resourceclaim controller watches the workload types that will
363
- be supported. For other types of workloads, there will be a requirement that the workload
364
- controller has permissions to update the status subresource of the ` ResourceClaim ` . The
365
- resourceclaim controller will also try to detect if an unknown resource referenced in the
366
- ` ReservedFor ` list has been deleted from the cluster, but that requires that the controller
367
- has permissions to get or list resources of the type. If the resourceclaim controller is
368
- not able to check, it will just wait until the reference in the ` ReservedFor ` list is removed.
360
+ This requires that the controller/user has permissions to update the status
361
+ subresource of the ` ResourceClaim ` . The resourceclaim controller will also try to detect if
362
+ the resource referenced in the ` ReservedFor ` list has been deleted from the cluster, but
363
+ that requires that the controller has permissions to get or list resources of the type. If the
364
+ resourceclaim controller is not able to check, it will just wait until the reference in
365
+ the ` ReservedFor ` list is removed. The resourceclaim controller will not have a watch
366
+ on the worikload resource, so there is no guarantee that the controller will realize that
367
+ the resource has been deleted. This is an extra check since it is the responsibility of
368
+ the workload controller to update the claim.
369
369
370
370
##### Finding pods using a ResourceClaim
371
371
If the reference in the ` ReservedFor ` list is to a non-Pod resource, controllers can no longer
@@ -473,6 +473,22 @@ The API server will no longer accept the new fields and the other components wil
473
473
not know what to do with them. So the result is that the ` ReservedFor ` list will only
474
474
have references to pod resources like today.
475
475
476
+ Any ResourceClaims that have already been allocated when the feature was active will
477
+ have non-pod references in the ` ReservedFor ` list after a downgrade, but the controllers
478
+ will not know how to handle it. There are two problems that will arise as a result of
479
+ this:
480
+ - The workload controller will also have been downgraded if it is in-tree, meaning that
481
+ it will not remove the reference to workload resource from the ` ReservedFor ` list, thus
482
+ leading to a situation where the claim will never be deallocated.
483
+ - For new pods that gets scheduled, the scheduler will add pod references in the
484
+ ` ReservedFor ` list, despite there being a non-pod reference here. So it ends up with
485
+ both pod and non-pod references, which breaks the assumptions of the design. We need
486
+ to make sure the system can handle this, as it might also happen as a result of
487
+ disablement and the enablement of the feature.
488
+
489
+ We can address this by adding the logic for the non-pod references in 1.34 and then
490
+ add the actual feature for 1.35.
491
+
476
492
### Version Skew Strategy
477
493
478
494
If the kubelet is on a version that doesn't support the feature but the rest of the
@@ -482,10 +498,9 @@ since it will still check whether the `Pod` is references in the `ReservedFor` l
482
498
If the API server is on a version that supports the feature, but the scheduler
483
499
is not, the scheduler will not know about the new fields added, so it will
484
500
put the reference to the ` Pod ` in the ` ReservedFor ` list rather than the reference
485
- in the ` spec.ReservedFor ` list. As a result, the workload will get scheduled, but
486
- it will be subject to the 256 limit on the size of the ` ReservedFor ` list and the
487
- controller creating the ` ResourceClaim ` will not find the reference it expects
488
- in the ` ReservedFor ` list when it tries to remove it.
501
+ in the ` spec.ReservedFor ` list. It will do this even if there is already a non-pod
502
+ reference in the ` spec.ReservedFor ` list. This leads to the challenge described
503
+ in the previous section.
489
504
490
505
## Production Readiness Review Questionnaire
491
506
543
558
544
559
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
545
560
546
- Applications that were already running will continue to run and the allocated
547
- devices will remain so.
548
- For the resource types supported directly, the resource claim controller will not remove the
549
- reference in the ` ReservedFor ` list, meaning the devices will not be deallocated. If the workload
550
- controller is responsible for removing the reference, deallocation will work as long as the
551
- feature isn't also disabled in the controllers. If they are, deallocation will not happen in this
552
- situation either .
561
+ Applications that were already running will continue to run. But if a pod have to be
562
+ re-admitted by a kubelet where the feature has been disabled, it will not be able to, since
563
+ the kubelet will not find a reference to the pod in the ` ReservedFor ` list.
564
+
565
+ The feature will also be disabled for in-tree workload controllers, meaning that they will
566
+ not remove the reference to the pod from the ` ReservedFor ` list. This means the list will never
567
+ be empty and the resourceclaim controller will never deallocate the claim .
553
568
554
569
###### What happens if we reenable the feature if it was previously rolled back?
555
570
556
571
It will take affect again and will impact how the ` ReservedFor ` field is used during allocation
557
- and deallocation.
572
+ and deallocation. Since this scenario allows a ResourceClaim with the ` spec.ReservedFor ` field
573
+ to be set and then have the scheduler populate the ` ReservedFor ` list with pods when the feature
574
+ is disabled, we will end up in a situation where the ` ReservedFor ` list can contain both non-pod
575
+ and pod references. We need to make sure all components can handle that.
558
576
559
577
###### Are there any tests for feature enablement/disablement?
560
578
723
741
724
742
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
725
743
726
- No
744
+ Yes and no. We are adding two new fields to the ResourceClaim type, but neither are of a collection type
745
+ so they should have limited impact on the total size of the objects. However, this feature means that
746
+ we no longer need to keep a complete list of all pods using a ResourceClaim, which can significantly
747
+ reduce the size of ResourceClaim objects shared by many pods.
727
748
728
749
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
729
750
0 commit comments