Skip to content

WIP: KEP-5343: Updates to kube-proxy-backends #5344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented May 27, 2025

  • One-line PR description: Post-KEP-3866, figure out what to do about "default" kube-proxy backend and deprecated backends

(This is WIP, but ready for review; the WIP-iness is about figuring out the scope/details of what we want to do.)

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 27, 2025
@k8s-ci-robot k8s-ci-robot requested a review from aojea May 27, 2025 02:09
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label May 27, 2025
@k8s-ci-robot k8s-ci-robot requested a review from thockin May 27, 2025 02:09
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 27, 2025
Comment on lines +177 to +182
Another possibility would be to deprecate the existing multi-mode
kube-proxy binary in favor of having separate `kube-proxy-iptables`,
`kube-proxy-ipvs`, and `kube-proxy-nftables` binaries (and perhaps,
eventually, separate images). That would also work well with the plan
to deprecate `ipvs` mode (and would allow us to completely remove
the existing deprecated CLI options)...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is painful for developers and users, more binaries means more images to maintain and to release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple binaries in the same image (built from nearly the same sources) would not really be much more work for anyone.

we could maybe even do the argv[0] hack and just have a single binary, but have it behave differently depending on whether you invoke it as kube-proxy or kube-proxy-nftables...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was assuming independent artifacts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, both options (multiple binaries, one image / multiple binaries, multiple images) are things to consider. Clearly the single image version is simpler though.


- Moving `kube-proxy-ipvs` to a staged repository.

- Moving `kube-proxy-ipvs` to a non-staged repository.
Copy link
Member

@aojea aojea May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just fork the entire code in its own repo and open it for new maintainers, basically advocating for this option and what you conclude in the next paragraph

@aojea
Copy link
Member

aojea commented May 30, 2025

a good exercise for people willing to help will be to create a standalone repo with the ipvs proxy and windows proxy from the existing code in k/k to show feasibility ... I think that if that works we just start the deprecation period and point the people that wants to use them to this new repo ...

@shaneutt
Copy link
Member

shaneutt commented Jun 5, 2025

/cc

@k8s-ci-robot k8s-ci-robot requested a review from shaneutt June 5, 2025 16:11

## Proposal

### Determine a timeline for declaring `nftables` to be the "preferred" backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spit balling an idea here:
As a GKE user, we just use whatever GKE give us. Last I checked on v1.31 clusters, we get iptables. I'd very much like us to move to nftables, exposing me and our team to nftables, and also helping shake out any bugs that we may hit.

In other KEPs I've seen graduation criteria include conditions saying that enough clouds needs to use a feature before it can be declared GA... I wonder if we need to do something similar here? Something along the lines of "Go convince X providers to set nftables to default".
I assume a cloud provider will need some sort of motivation to do so, so I guess we may need to give them a reason to do that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graduation criteria like that are generally for features that are implemented outside of k/k. You can't declare a NetworkPolicy or LoadBalancer feature GA if network plugins / cloud providers haven't implemented it.

The motivation for clouds/distros/etc to move to nftables is that it's faster and more efficient. If nobody was moving to nftables by default, that would probably be a good signal that there's something wrong and we need to deal with it before declaring nftables default. But I don't think it quite works in the other direction: the fact that Amazon and Google have decided to ship new-enough kernels that they can support nftables doesn't imply that everyone else is running new-enough kernels that they can support nftables...


- Figure out the situation around "the default kube-proxy backend".

- Figure out a plan for deprecating the `ipvs` backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the sig-network call, I asked if it was possible to move this item into its own KEP, so we could move on it faster.
What was the reason that that wasn't a good idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily not a good idea. I was saying the ipvs plan may want to be informed by what we decide about making nftables the default, but I guess we're already telling people to stop using it either way, so maybe it does make sense to split it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was saying the ipvs plan may want to be informed by what we decide about making nftables the default, but I guess we're already telling people to stop using it either way, so maybe it does make sense to split it out.

Since ipvs is opt-in (ie: it's not default), I assume people have a reason they want to use ipvs. My assumption is that the reason is for performance. So in theory nftables fixes that for them, and no matter what we do with the default, they can chose to use nftables too.

I also assume that iptables is here to stay, for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but nftables only fixes it for them if they can run nftables, and currently the kernel requirement for being able to run nftables is much more recent than for kubernetes as a whole (and we don't currently have good information about what percentage of kubernetes users have a new-enough kernel to support nftables kube-proxy).

Also FWIW, the most recent "maybe you shouldn't use ipvs" issue was filed against 1.27, which is before even alpha nftables...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants