Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI Pod Subnet ignores Pod Subnet UDR #3204

Open
david-garcia-garcia opened this issue Nov 26, 2024 · 2 comments
Open

CNI Pod Subnet ignores Pod Subnet UDR #3204

david-garcia-garcia opened this issue Nov 26, 2024 · 2 comments
Assignees
Labels
swift Related to SWIFT networking.

Comments

@david-garcia-garcia
Copy link

david-garcia-garcia commented Nov 26, 2024

What happened:

When using CNI Pod Subnet with Dynamic Allocation, pod traffic is completely influenced by Node's NSG and Routes. I am not even sure that UDR are even applied in the pod subnets.

Also, this statement "the pod IP is always the source address for any traffic from the pod" from the docs appaers to be wrong.

From the docs:

Separate VNet policies for pods: Since pods have a separate subnet, you can configure separate VNet policies for them that are different from node policies. This enables many useful scenarios such as allowing internet connectivity only for pods and not for nodes, fixing the source IP for pod in a node pool using an Azure NAT Gateway, and using NSGs to filter traffic between node pools.

By examining vnet flow logs and evidence of impact of FW rules, everything points to packets being ultimately routed/SNATEd through and affected by the Node's subnet route table and NSG. Even if these are in different subnets with different policies (route tables and NSG).

The only official docs about CNI Pod Subnet seem to be:

Other links of interest:

At the end of that post they mention this:

What source IP do external systems see for traffic that originates in an Azure CNI-enabled pod?

Systems in the same virtual network as the AKS cluster see the pod IP as the source address for any traffic from the pod. Systems outside the AKS cluster virtual network see the node IP as the source address for any traffic from the pod. But for Azure CNI dynamic IP allocation, no matter the connection is inside the same virtual network or cross virtual networks, the pod IP is always the source address for any traffic from the pod. This is because the Azure CNI for dynamic IP allocation implements Microsoft Azure Container Networking infrastructure, which gives end-to-end experience. Hence, it eliminates the use of ip-masq-agent, which is still used by traditional Azure CNI.

So this talks about "Dynamic IP allocation" VS Non dynamic IP allocation. Wether you are using one or the other seems to be related to a preview feature:

Register the subscription-level feature flag for your subscription: 'Microsoft.ContainerService/AzureVnetScalePreview'.

and to having the monitoring add-on enabled:

az aks enable-addons --addons monitoring --name <cluster-name> --resource-group <resource-group-name>

Image

At this point I am not even sure what kind of CNI Networking modality is my AKS cluster using. To my understanding, Dynamic allocation is the default behaviour, and static allocation needs to be specified with (--pod-ip-allocation-mode StaticBlock) which is not yet available on terraform resource for AKS.

Example

Pod subnet has route to FW (0.0.0.0/0 -> VA) and Node subnet has same route. FW is seeing traffic as coming from the node instead of the pod IP source address, so it is not possible to have specific FW rules for pod subnets.

Flow logs show that the packet is going through the nodepool's NSG with a source IP corresponding to the actual node:

10.32.0.6 is the Node's IP

"1732605379536,10.32.0.6,217.148.136.64,48003,80,6,O,B,NX,0,0,0,0",
"1732605379568,10.32.0.6,217.148.136.64,48003,80,6,O,E,NX,3,383,2,220",
"1732605379568,10.32.0.6,217.148.136.64,48003,80,6,O,B,NX,0,0,0,0",

What you expected to happen:

Routing and NSG rules in a pod subnet when using CNI Pod Subnet should be applied independently of the node subnet's UDR and NSG.

How to reproduce it:

  • Create an AKS cluster on 4 subnets: node0, node1, pods0, pods1
  • Default node pool uses vnet node0 and pod subnet pods0
  • Windows node pool uses vnet node1 and pod subnet pods1
  • The pod has internet access
  • Add a route to pods1 subnet [0.0.0.0/0 to None] expected result is that pod loses internet connectivity, but nothing happens
  • Add a route to nodes1 subnet [0.0.0.0/0 to None] now the pod loses internet conectivity

Orchestrator and Version (e.g. Kubernetes, Docker):

Kubernetes: v1.30.5

Operating System (Linux/Windows):

Windows

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):

10.0.20348.2031 (WinBuild.160101.0800)

Anything else we need to know?:

I am not sure if this is a bug, a misconfiguration, or a misinterpretation of what the 'CNI Pod Subnet' feature is supposed to be.

@david-garcia-garcia david-garcia-garcia changed the title CNI Pod Subnet ignores Pod Subnet UDR + CNI Pod Subnet ignores Pod Subnet UDR Nov 26, 2024
@rbtr rbtr added the swift Related to SWIFT networking. label Nov 27, 2024
@rbtr
Copy link
Contributor

rbtr commented Nov 27, 2024

hey @david-garcia-garcia I don't think this is the intended behavior with podsubnet, can you share azcli command or ARM template used to create your env for us to reproduce it?

@david-garcia-garcia
Copy link
Author

@rbtr I'll be working on isolating our TF templates to build a minimal setup that reproduces the issue. This will take some time.

In the meanwhile I'll document here what we observe on the cluster the might be related to this issue.

Since setting up this cluster (the first one with pod subnet, we've always used mixed node+pod subnets without issues) we identified at least two additional networking components missbehaving.

azure-cns on windows nodes (mcr.microsoft.com/containernetworking/azure-cns:v1.6.13) will eventually loose it's authentication to the cluster's management API. Note that this is not happening to CNS on the Linux nodes.

│ W1201 06:13:15.922611    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha.NodeNetworkConfig: Unauthorized                                        │
│ E1201 06:13:15.922611    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1alpha.NodeNetworkConfig: failed to list *v1alpha.NodeNetworkConfig: Unautho ││ W1201 06:13:43.201915    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1.Pod: Unauthorized                                                           │
│ E1201 06:13:43.201915    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized                                  │
│ {"level":"info","ts":"2024-12-01T06:14:01.519Z","caller":"v2/monitor.go:124","msg":"calculated new request","component":"ipam-pool-monitor","demand":8,"batch":16,"max":50,"buffer":0.5,"target":16}          │
│ {"level":"info","ts":"2024-12-01T06:14:01.519Z","caller":"v2/monitor.go:127","msg":"NNC already at target IPs, no scaling required","component":"ipam-pool-monitor"}                                          │
│ W1201 06:14:04.724173    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha.NodeNetworkConfig: Unauthorized                                        │
│ E1201 06:14:04.724173    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1alpha.NodeNetworkConfig: failed to list *v1alpha.NodeNetworkConfig: Unautho │
│ W1201 06:14:34.731016    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1.Pod: Unauthorized                                                           │
│ E1201 06:14:34.731016    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized                                  │
│ W1201 06:14:35.139979    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha.NodeNetworkConfig: Unauthorized                                        │
│ E1201 06:14:35.139979    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1alpha.NodeNetworkConfig: failed to list *v1alpha.NodeNetworkConfig: Unautho │
│ {"level":"info","ts":"2024-12-01T06:15:01.520Z","caller":"v2/monitor.go:124","msg":"calculated new request","component":"ipam-pool-monitor","demand":8,"batch":16,"max":50,"buffer":0.5,"target":16}          │
│ {"level":"info","ts":"2024-12-01T06:15:01.520Z","caller":"v2/monitor.go:127","msg":"NNC already at target IPs, no scaling required","component":"ipam-pool-monitor"}                                          │
│ W1201 06:15:09.553802    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha.NodeNetworkConfig: Unauthorized                                        │
│ E1201 06:15:09.553802    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1alpha.NodeNetworkConfig: failed to list *v1alpha.NodeNetworkConfig: Unautho │
│ W1201 06:15:22.718539    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1.Pod: Unauthorized                                                           │
│ E1201 06:15:22.718539    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized                                  │
│ W1201 06:15:55.630620    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha.NodeNetworkConfig: Unauthorized                                        │
│ E1201 06:15:55.630620    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1alpha.NodeNetworkConfig: failed to list *v1alpha.NodeNetworkConfig: Unautho │
│ W1201 06:16:01.109326    6540 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1.Pod: Unauthorized                                                           │
│ E1201 06:16:01.109326    6540 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized                                  │
│ {"level":"info","ts":"2024-12-01T06:16:01.520Z","caller":"v2/monitor.go:124","msg":"calculated new request","component":"ipam-pool-monitor","demand":8,"batch":16,"max":50,"buffer":0.5,"target":16}          │
│ {"level":"info","ts":"2024-12-01T06:16:01.520Z","caller":"v2/monitor.go:127","msg":"NNC already at target IPs, no scaling required","component":"ipam-pool-monitor"} 

I've examined the historical logs for that pod and this starts happening out of nowhere. No specific error messages prior to the failure give any additional hint on what happened to authentication.

About the connectivity agent (mcr.microsoft.com/oss/kubernetes/apiserver-network-proxy/agent:v0.30.3-hotfix.20240819) something weird also happens, after working OK for some time, they start spamming messages such as:

I1201 06:22:08.160235       1 client.go:528] "remote connection EOF" connectionID=6981                                                                                                                        │
│ I1201 06:22:25.587138       1 client.go:528] "remote connection EOF" connectionID=6982                                                                                                                        │
│ I1201 06:22:28.154312       1 client.go:528] "remote connection EOF" connectionID=6983                                                                                                                        │
│ I1201 06:22:34.797901       1 client.go:528] "remote connection EOF" connectionID=6886                                                                                                                        │
│ I1201 06:22:45.592290       1 client.go:528] "remote connection EOF" connectionID=6887                                                                                                                        │
│ I1201 06:22:48.198009       1 client.go:528] "remote connection EOF" connectionID=6984                                                                                                                        │
│ I1201 06:22:54.892502       1 client.go:528] "remote connection EOF" connectionID=6985                                                                                                                        │
│ I1201 06:23:11.087465       1 client.go:528] "remote connection EOF" connectionID=6889                                                                                                                        │
│ I1201 06:23:14.787701       1 client.go:528] "remote connection EOF" connectionID=6890                                                                                                                        │
│ I1201 06:23:25.578375       1 client.go:528] "remote connection EOF" connectionID=6892                                                                                                                        │
│ I1201 06:23:28.184660       1 client.go:528] "remote connection EOF" connectionID=6893                                                                                                                        │
│ I1201 06:23:34.801468       1 client.go:528] "remote connection EOF" connectionID=6986                                                                                                                        │
│ I1201 06:23:48.151358       1 client.go:528] "remote connection EOF" connectionID=6987                                                                                                                        │
│ I1201 06:23:54.211852       1 client.go:528] "remote connection EOF" connectionID=6895                                                                                                                        │
│ I1201 06:24:11.084989       1 client.go:528] "remote connection EOF" connectionID=6989                                                                                                                        │
│ I1201 06:24:45.577507       1 client.go:528] "remote connection EOF" connectionID=6990                                                                                                                        │
│ I1201 06:24:48.151196       1 client.go:528] "remote connection EOF" connectionID=6991                                                                                                                        │
│ I1201 06:24:51.096125       1 client.go:528] "remote connection EOF" connectionID=6896                                                                                                                        │
│ I1201 06:24:54.802809       1 client.go:528] "remote connection EOF" connectionID=6992                                                                   

We are investigating these internally and engaging with Azure support, I'll be posting here any progress made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
swift Related to SWIFT networking.
Projects
None yet
Development

No branches or pull requests

3 participants