-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNI Pod Subnet ignores Pod Subnet UDR #3204
Comments
hey @david-garcia-garcia I don't think this is the intended behavior with podsubnet, can you share azcli command or ARM template used to create your env for us to reproduce it? |
@rbtr I'll be working on isolating our TF templates to build a minimal setup that reproduces the issue. This will take some time. In the meanwhile I'll document here what we observe on the cluster the might be related to this issue. Since setting up this cluster (the first one with pod subnet, we've always used mixed node+pod subnets without issues) we identified at least two additional networking components missbehaving. azure-cns on windows nodes (mcr.microsoft.com/containernetworking/azure-cns:v1.6.13) will eventually loose it's authentication to the cluster's management API. Note that this is not happening to CNS on the Linux nodes.
I've examined the historical logs for that pod and this starts happening out of nowhere. No specific error messages prior to the failure give any additional hint on what happened to authentication. About the connectivity agent (mcr.microsoft.com/oss/kubernetes/apiserver-network-proxy/agent:v0.30.3-hotfix.20240819) something weird also happens, after working OK for some time, they start spamming messages such as:
We are investigating these internally and engaging with Azure support, I'll be posting here any progress made. |
What happened:
When using CNI Pod Subnet with Dynamic Allocation, pod traffic is completely influenced by Node's NSG and Routes. I am not even sure that UDR are even applied in the pod subnets.
Also, this statement "the pod IP is always the source address for any traffic from the pod" from the docs appaers to be wrong.
From the docs:
By examining vnet flow logs and evidence of impact of FW rules, everything points to packets being ultimately routed/SNATEd through and affected by the Node's subnet route table and NSG. Even if these are in different subnets with different policies (route tables and NSG).
The only official docs about CNI Pod Subnet seem to be:
Other links of interest:
At the end of that post they mention this:
So this talks about "Dynamic IP allocation" VS Non dynamic IP allocation. Wether you are using one or the other seems to be related to a preview feature:
and to having the monitoring add-on enabled:
At this point I am not even sure what kind of CNI Networking modality is my AKS cluster using. To my understanding, Dynamic allocation is the default behaviour, and static allocation needs to be specified with (--pod-ip-allocation-mode StaticBlock) which is not yet available on terraform resource for AKS.
Example
Pod subnet has route to FW (0.0.0.0/0 -> VA) and Node subnet has same route. FW is seeing traffic as coming from the node instead of the pod IP source address, so it is not possible to have specific FW rules for pod subnets.
Flow logs show that the packet is going through the nodepool's NSG with a source IP corresponding to the actual node:
10.32.0.6 is the Node's IP
What you expected to happen:
Routing and NSG rules in a pod subnet when using CNI Pod Subnet should be applied independently of the node subnet's UDR and NSG.
How to reproduce it:
Orchestrator and Version (e.g. Kubernetes, Docker):
Kubernetes: v1.30.5
Operating System (Linux/Windows):
Windows
Kernel (e.g.
uanme -a
for Linux or$(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion
for Windows):10.0.20348.2031 (WinBuild.160101.0800)
Anything else we need to know?:
I am not sure if this is a bug, a misconfiguration, or a misinterpretation of what the 'CNI Pod Subnet' feature is supposed to be.
The text was updated successfully, but these errors were encountered: