7
7
- [ Goals] ( #goals )
8
8
- [ Non-Goals] ( #non-goals )
9
9
- [ Proposal] ( #proposal )
10
+ - [ Feature Gate handling] ( #feature-gate-handling )
10
11
- [ User Stories (Optional)] ( #user-stories-optional )
11
12
- [ Story 1 - Pod VS NetworkPolicy] ( #story-1---pod-vs-networkpolicy )
12
13
- [ Story 2 - having finalizer conflicts with deletion order] ( #story-2---having-finalizer-conflicts-with-deletion-order )
24
25
- [ Integration tests] ( #integration-tests )
25
26
- [ e2e tests] ( #e2e-tests )
26
27
- [ Graduation Criteria] ( #graduation-criteria )
27
- - [ Alpha] ( #alpha )
28
28
- [ Beta] ( #beta )
29
29
- [ GA] ( #ga )
30
30
- [ Upgrade / Downgrade Strategy] ( #upgrade--downgrade-strategy )
@@ -140,6 +140,12 @@ the resources associated with this namespace should be deleted in order:
140
140
- Wait for all the pods to be stopped or deleted.
141
141
- Delete all the other resources in the namespace (in an undefined order).
142
142
143
+ ### Feature Gate handling
144
+
145
+ Due to this KEP is addressing the security concern and we do wanna give options to close security gaps in the past,
146
+ the feature gate will be introduced as beta and on by default in 1.33 release. We will backport the feature gate with off-by-default
147
+ configuration to all supported releases. See [ the detailed discussion on slack] ( https://kubernetes.slack.com/archives/CJH2GBF7Y/p1741258168683299 )
148
+
143
149
### User Stories (Optional)
144
150
145
151
#### Story 1 - Pod VS NetworkPolicy
@@ -348,24 +354,17 @@ in back-to-back releases.
348
354
- Address feedback on usage/changed behavior, provided on GitHub issues
349
355
- Deprecate the flag
350
356
-->
351
- #### Alpha
357
+ #### Beta
352
358
353
359
- Feature implemented behind a feature flag
354
360
- Initial e2e tests completed and enabled
355
-
356
- #### Beta
357
-
358
- - Gather feedback from developers and surveys
359
361
- Complete features specified in the KEP
360
362
- Proper metrics added
361
363
- Additional tests are in Testgrid and linked in KEP
362
364
363
365
#### GA
364
366
365
- - N examples of real-world usage
366
- - N installs
367
- - More rigorous forms of testing—e.g., downgrade tests and scalability tests
368
- - Allowing time for feedback
367
+ - Related [ CVE] ( https://github.com/kubernetes/kubernetes/issues/126587 ) has been mitigated
369
368
- Conformance tests
370
369
371
370
** Note:** Generally we also wait at least two releases between beta and
@@ -444,13 +443,15 @@ feature flags will be enabled on some API servers and not others during the
444
443
rollout. Similarly, consider large clusters and how enablement/disablement
445
444
will rollout across nodes.
446
445
-->
446
+ This feature should not impact rollout.
447
447
448
448
###### What specific metrics should inform a rollback?
449
449
450
450
<!--
451
451
What signals should users be paying attention to when the feature is young
452
452
that might indicate a serious problem?
453
453
-->
454
+ N/A
454
455
455
456
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
456
457
@@ -459,12 +460,14 @@ Describe manual testing that was done and the outcomes.
459
460
Longer term, we may want to require automated upgrade/rollback tests, but we
460
461
are missing a bunch of machinery and tooling and can't do that now.
461
462
-->
463
+ N/A
462
464
463
465
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
464
466
465
467
<!--
466
468
Even if applying deprecation policies, they may still surprise some users.
467
469
-->
470
+ No.
468
471
469
472
### Monitoring Requirements
470
473
@@ -482,6 +485,7 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
482
485
checking if there are objects with field X set) may be a last resort. Avoid
483
486
logs or events for this purpose.
484
487
-->
488
+ Check if the feature gate is enabled. The feature is a security fix which should not be user detectable.
485
489
486
490
###### How can someone using this feature know that it is working for their instance?
487
491
@@ -494,13 +498,7 @@ and operation of this feature.
494
498
Recall that end users cannot usually observe component logs or access metrics.
495
499
-->
496
500
497
- - [ ] Events
498
- - Event Reason:
499
- - [ ] API .status
500
- - Condition name:
501
- - Other field:
502
- - [ ] Other (treat as last resort)
503
- - Details:
501
+ N/A
504
502
505
503
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
506
504
@@ -518,26 +516,22 @@ high level (needs more precise definitions) those may be things like:
518
516
These goals will help you determine what you need to measure (SLIs) in the next
519
517
question.
520
518
-->
519
+ The feature only affect namespace deletion and should not affect existing SLOs.
521
520
522
521
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
523
522
524
523
<!--
525
524
Pick one more of these and delete the rest.
526
525
-->
527
-
528
- - [ ] Metrics
529
- - Metric name:
530
- - [ Optional] Aggregation method:
531
- - Components exposing the metric:
532
- - [ ] Other (treat as last resort)
533
- - Details:
526
+ The error or blocker will be updated to namespace status subresource to follow the existing pattern.
534
527
535
528
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
536
529
537
530
<!--
538
531
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
539
532
implementation difficulties, etc.).
540
533
-->
534
+ Namespace status will be used to capture the possible error or blockers while deletion.
541
535
542
536
### Dependencies
543
537
@@ -561,7 +555,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
561
555
- Impact of its outage on the feature:
562
556
- Impact of its degraded performance or high-error rates on the feature:
563
557
-->
564
-
558
+ No.
565
559
### Scalability
566
560
567
561
<!--
@@ -588,6 +582,7 @@ Focusing mostly on:
588
582
- periodic API calls to reconcile state (e.g. periodic fetching state,
589
583
heartbeats, leader election, etc.)
590
584
-->
585
+ No.
591
586
592
587
###### Will enabling / using this feature result in introducing new API types?
593
588
@@ -597,15 +592,15 @@ Describe them, providing:
597
592
- Supported number of objects per cluster
598
593
- Supported number of objects per namespace (for namespace-scoped objects)
599
594
-->
600
-
595
+ No.
601
596
###### Will enabling / using this feature result in any new calls to the cloud provider?
602
597
603
598
<!--
604
599
Describe them, providing:
605
600
- Which API(s):
606
601
- Estimated increase:
607
602
-->
608
-
603
+ No.
609
604
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
610
605
611
606
<!--
@@ -614,7 +609,7 @@ Describe them, providing:
614
609
- Estimated increase in size: (e.g., new annotation of size 32B)
615
610
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
616
611
-->
617
-
612
+ No.
618
613
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
619
614
620
615
<!--
@@ -625,7 +620,7 @@ Think about adding additional work or introducing new steps in between
625
620
626
621
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
627
622
-->
628
-
623
+ No.
629
624
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
630
625
631
626
<!--
@@ -637,7 +632,7 @@ This through this both in small and large cases, again with respect to the
637
632
638
633
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
639
634
-->
640
-
635
+ No.
641
636
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
642
637
643
638
<!--
@@ -649,7 +644,7 @@ If any of the resources can be exhausted, how this is mitigated with the existin
649
644
Are there any tests that were run/should be run to understand performance characteristics better
650
645
and validate the declared limits?
651
646
-->
652
-
647
+ No.
653
648
### Troubleshooting
654
649
655
650
<!--
@@ -664,7 +659,7 @@ details). For now, we leave it here.
664
659
-->
665
660
666
661
###### How does this feature react if the API server and/or etcd is unavailable?
667
-
662
+ The namespace controller will act exactly the same with/without this feature.
668
663
###### What are other known failure modes?
669
664
670
665
<!--
@@ -679,9 +674,9 @@ For each of them, fill in the following information by copying the below templat
679
674
Not required until feature graduated to beta.
680
675
- Testing: Are there any tests for failure mode? If not, describe why.
681
676
-->
682
-
677
+ Namespace deletion might hang if pod resources deletion running into issues with the feature gate enabled.
683
678
###### What steps should be taken if SLOs are not being met to determine the problem?
684
-
679
+ Delete the blocking resources manually.
685
680
## Implementation History
686
681
687
682
<!--
0 commit comments