-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container networking failures on hosts with net.ipv4.conf.all.arp_ignore=2
#3880
Comments
I don't know if this is better to solved in the cni plugin itself, so it does not depend on kernel parameters. During this christmas I created my own cni plugin and I use the "onlink" flag to avoid arp @BenTheElder maybe we should switch to the cni-kindnet plugin, that already supports portmap and is a single binary instead of having to chain multiple like we do today https://kindnet.es/docs/design/cni/ |
I checked that the onlink flag does not work with arp_ignore=2 #3882 @shaneutt this is per interface option, this should be fixed in the cni containernetworking ptp plugin cc: @squeed
the fix sounds simple, just set arp_ignore to 0 in the interface, it is already done for other things |
hmm, no, it seems all overrides the per interface setting :/ |
I honestly do not know what is the best place to handle this
|
@shaneutt I didn't find any place that indicates where arp_ignore is defaulted to 2, is that set on the distro or is there any other software setting it ? The only place I found are some guides to configure the host with that option https://discourse.ubuntu.com/t/ubuntu-24-04-server-diy-router-project-with-ipv6-and-wireguard/52102 , but that is an user action |
We should reasonably attempt to defend against user actions anyhow.
we should only set it here if we think it will be desirable without kindnetd in use.
Interesting. It does add a bit more to maintain and patch (deps) here though. |
The distribution sets this and (as you came to realize above) the global setting overrides individual interface settings. Whatever the host has set gets passed down to containers in the container runtimes I have tested for this ( The most recent Ubuntu desktop is the only distribution I've run into so far that defaults this to 2 (as noted in my original description, the server edition at the same version does not seem to).
Switching seems fine, but also a large undertaking. So to me at least it seems reasonable as a stop-gap solution to set this in an opinionated manner until that is completed. My thinking is that I expect Ubuntu desktop is a fairly popular place to use |
/sig network |
Something like this may be the best solution, if we expect it to be reasonable with other CNI installations. (...yes?) We generally want to respect host settings (e.g. MTU) but this seems like one we might want to just always override at the node level. |
agree, @shaneutt do you mind changing your PR to define this value in https://github.com/kubernetes-sigs/kind/tree/main/images/base/files/etc/sysctl.d and add a comment with the rationale? |
Sure no problem! 🖖 |
What happened:
I use
kind
across a variety of modern Linux distributions. I deployed it today on a recent release of Ubuntu and noticed that thelocal-path-provisioner
was failing right after cluster creation. The logs showeddial tcp 10.96.0.1:443: connect: no route to host
. Digging in deeper, I created some pods and noticed that when I exec into them they are unable to send traffic to each other.What you expected to happen:
The
local-path-provisioner
should be able to start successfully, and pod networking should be functional.Environment:
kind v0.27.0 go1.22.2 linux/amd64
podman version 4.9.3
Ubuntu 24.04.2 LTS amd64
v1.32.2
, Serverv1.32.2
All default settings
How to reproduce it (as minimally and precisely as possible):
Ubuntu 24.04.2 LTS
apt-get install podman -y
go install sigs.k8s.io/[email protected]
kind create cluster
At this point you will find the
local-path-provisioner
is failing, and any pods you start can't access the network.Anything else we need to know?:
Stepping through some standard diagnostics I found pretty quickly that I was unable to
arping
the default route IP in these containers, buttcpdump
did show those ARP requests showing up on the node's veth. This led me to discover thatnet.ipv4.conf.all.arp_ignore=2
was set, which causes the ARP requests to go unanswered because this mode ignores those ARP requests because the sender's address and the requested address are not on the same subnet (kindnet configures a /32). So it appears thatkind
has historically relied onnet.ipv4.conf.all.arp_ignore=0
being set, as this has been pretty common for a variety of distributions but some distributions set it higher.I ran the following:
sysctl -w net.ipv4.conf.all.arp_ignore=1
and the
local-path-provisioner
came up and pod networking started working.As such I have created a patch which has worked for me in my testing, for your considerations:
#3881
However I'm a little perplexed: I would be kinda surprised if I was the first person to find this ? Did I do something weird here? I promise I searched around, if I missed another report staring me right in the face I'm sorry! 😂
The text was updated successfully, but these errors were encountered: