diff --git a/keps/sig-network/5343-kube-proxy-backends/README.md b/keps/sig-network/5343-kube-proxy-backends/README.md new file mode 100644 index 00000000000..b50e270bad0 --- /dev/null +++ b/keps/sig-network/5343-kube-proxy-backends/README.md @@ -0,0 +1,824 @@ +# KEP-5343: Updates to kube-proxy backend support + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Determine a timeline for declaring nftables to be the "preferred" backend](#determine-a-timeline-for-declaring-nftables-to-be-the-preferred-backend) + - [Decide what to do about "the default kube-proxy mode"](#decide-what-to-do-about-the-default-kube-proxy-mode) + - [Deprecating the ipvs backend](#deprecating-the-ipvs-backend) + - [Deprecating the winkernel backend](#deprecating-the-winkernel-backend) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +With [KEP-3866] going GA, there are now 4 supported kube-proxy +backends in-tree (`iptables`, `ipvs`, `nftables`, and `winkernel`). +The nftables KEP noted that, as future work, we should figure if/how +we can deprecate either or both of the older Linux backends. There has +also been sporadic discussion about deprecating the `winkernel` +backend or moving it out of tree, since it is not clear that it +benefits from being in-tree. + +Meanwhile, an earlier KEP, [KEP-3786], had proposed moving most of the +shared kube-proxy code into the kube-proxy staging repo +(`k8s.io/kube-proxy`) so that it could be more easily shared by +out-of-tree service proxy implementations. That KEP was never merged, +though there was general agreement with the high-level idea. + +The exact goals of this KEP are not finalized, but in general, the +plan is to figure out what to do with kube-proxy backends. + +[KEP-3866]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/3866-nftables-proxy/README.md +[KEP-3786]: https://github.com/kubernetes/enhancements/pull/3788 + +## Motivation + +### Goals + +(These are kind of "Meta-Goals", since we still need to decide exactly +what the Goals of the KEP are.) + +- Figure out the situation around "the default kube-proxy backend". + +- Figure out a plan for deprecating the `ipvs` backend. + +- Figure out (in conjunction with SIG Windows) what we want to do with + the `winkernel` backend. + +Assuming we end up deciding we want to move any proxy implementation +out of tree, we will likely want to: + +- Refactor and redesign the kube-proxy "library" code + (`k8s.io/kubernetes/pkg/proxy`) to make it more usable and more + maintainable, and to fix existing bugs. + +- Move most of `k8s.io/kubernetes/pkg/proxy/` (except for the backend + implementations) into the kube-proxy staging repo, + `k8s.io/kube-proxy`, so it can be shared with out-of-tree backends. + +### Non-Goals + +- We do not want to do anything that sounds like "removing kube-proxy + from core kubernetes". In particular: + + - We do not want to remove `cmd/kube-proxy` from + `k8s.io/kubernetes`. + + - We do not want to stop building an official Linux kube-proxy + container image. + +## Proposal + +### Determine a timeline for declaring `nftables` to be the "preferred" backend + +We expect that *eventually* the `nftables` backend will be a better +choice for everyone than the `iptables` backend. However, at the +moment, it requires a new-ish kernel, which may possibly be too new +for some users. + +There has been some work to try to define a [minimum required kernel +version for Kubernetes], though the work has been in-progress so long +that the originally-proposed minimum version (4.19 LTS) is now EOL. +The recommended minimum kernel version for nftables kube-proxy is +quite a bit newer than this though (5.13), so it may be a while before +we can assume nftables as the default for all users. + +[minimum required kernel version for Kubernetes]: https://github.com/kubernetes/kubernetes/issues/116799 + +### Decide what to do about "the default kube-proxy mode" + +The `iptables` and `nftables` kube-proxy backends are not 100% +compatible, both intentionally (e.g. changes to NodePort handling in +`nftables`) and accidentally (some network plugins or other tools +might do things like assume the existence of certain `iptables` +chains). When we [changed the default proxy mode from `userspace` to +`iptables`] (in Kubernetes 1.2), we decided to just not worry about +compatibility problems. We can't plausibly do that this time. + +Rather than actually changing the default, we could try to get to a +point where the "default" no longer matters, because everyone is +specifying which proxy mode they want *explicitly*. We can start by +having kube-proxy log warnings and Events if you don't explicitly +specify a proxy mode, and eventually, after enough time has passed, we +could make it an error to start kube-proxy without explicitly +specifying the mode. + +Assuming the `v1alpha2` config file format happens, we could also +remove defaulting of `mode` from that config version, and make the +`v1alpha2` `mode` field be required and non-defaulted. + +Another possibility would be to deprecate the existing multi-mode +kube-proxy binary in favor of having separate `kube-proxy-iptables`, +`kube-proxy-ipvs`, and `kube-proxy-nftables` binaries (and perhaps, +eventually, separate images). That would also work well with the plan +to deprecate `ipvs` mode (and would allow us to completely remove +the existing deprecated CLI options)... + +[changed the default proxy mode from `userspace` to `iptables`]: https://github.com/kubernetes/kubernetes/pull/16344 + +### Deprecating the `ipvs` backend + +We will have to maintain both `nftables` and `iptables` for a while, +but we would like to stop maintaining `ipvs` as soon as we can. + +At this point, there is no real reason to prefer the `ipvs` backend to +`nftables`, other than it having slightly looser kernel version +requirements. The `nftables` backend has slightly better performance +than `ipvs` (plus a cleaner architecture and better feature parity +with `iptables`) and none of the other features that make IPVS +interesting (such as configurable schedulers) really apply to the case +of a Kubernetes service proxy. (The simpler schedulers, like `rr`, +could be implemented in the `nftables` backend if we cared, while the +more complicated ones, like `mh`, end up being thwarted by the fact +that the kube-proxy instances on different nodes don't share +information with each other.) + +At this point, we are not likely to add support for new features to +the `ipvs` backend (unless they come "for free" via the +backend-indepdent code). We have also slowed down bugfixing for +`ipvs`, and suggested that users filing bugs against it should try out +the `nftables` backend instead. + +At some point we will need to take some more definitive action. +Possibilities include: + + - Logging warnings/Events about the fact that `ipvs` mode is + deprecated. + + - Moving the `ipvs` mode out of the primary `kube-proxy` binary into + a separate `kube-proxy-ipvs` binary. + + - Moving `kube-proxy-ipvs` to a staged repository. + + - Moving `kube-proxy-ipvs` to a non-staged repository. + + - Removing support for `ipvs` from kind and kubeadm. + +Since part of the goal is to reduce the maintenance burden on SIG +Network, we wouldn't want to provide full support for +`kube-proxy-ipvs`. Perhaps some community members would be willing to +take up maintenance of it instead. Or, perhaps, we could simply +abandon it after splitting it out, leaving it permanently stuck +implementing only the features that it had at that time. + +### Deprecating the `winkernel` backend + +Although SIG Network is considered co-responsible for the `winkernel` +kube-proxy backend, none of us understand it at all, and the handful +of people who do work on it do not tend to interact with SIG Network +very much. + +Additionally, like much of Kubernetes-on-Windows, the kube-proxy +backend has lagged behind in functionality, and has several associated +feature gates that have been stuck at Alpha or Beta for years. + +At one point, there was more effort going into the KPNG Windows +backend than the official kube-proxy backend, and perhaps some of that +work could be recovered if the Windows proxy backend was moved out of +tree. + +This could be done in the same way as described for the `ipvs` backend +above, but the actual decisions about where to land the new code would +presumably be made by SIG Windows. + +### Risks and Mitigations + + + +## Design Details + + + +### Test Plan + + + +[ ] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- ``: `` - `` + +##### Integration tests + + + + + +- : + +##### e2e tests + + + +- : + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-network/5343-kube-proxy-backends/kep.yaml b/keps/sig-network/5343-kube-proxy-backends/kep.yaml new file mode 100644 index 00000000000..e3143773594 --- /dev/null +++ b/keps/sig-network/5343-kube-proxy-backends/kep.yaml @@ -0,0 +1,41 @@ +title: Updates to kube-proxy backend support +kep-number: 5343 +authors: + - "@danwinship" +owning-sig: sig-network +participating-sigs: + - sig-windows +status: provisional +creation-date: 2025-05-26 +reviewers: + - "@aojea" + - "@thockin" +approvers: + - "@aojea" + - "@thockin" +see-also: + - "/keps/sig-network/3866-nftables-proxy" +replaces: + +# The target maturity stage in the current dev cycle for this KEP. +# If the purpose of this KEP is to deprecate a user-visible feature +# and a Deprecated feature gates are added, they should be deprecated|disabled|removed. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.34" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.34" + beta: + stable: + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + +# The following PRR answers are required at beta release +metrics: