Timing issues when a cluster restarts #10859
-
We have a small budget for infrastructure, and for our dev environment, we shut down the cluster when we don't need it, usually after workhours. We are testing linkerd before implementing the whole thing on production. When we start the cluster some pods start before linkerd, and they end up outside the mesh. The only solution so far is manually killing the pod, and when it's recreated, the proxy is injected. If there any official way to solve this issue? Or the init container approach is the way to go? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@xolott hey, thanks for raising this. I assume you use the CNI plugin in your environment. Unfortunately, this is a short-coming in the CNI spec and this issue isn't exclusive to In the latest stable version, each injected pod (when using the CNI plugin) will receive a validator init container that will fail if the pod ran before the CNI plugin was configured. This will effectively stop your pod from rolling out. You can write a controller to detect these init container failures if you want an automated way to restart them. Otherwise, if you don't mind granting additional permissions, the init container that uses iptables might be a better choice. |
Beta Was this translation helpful? Give feedback.
@xolott hey, thanks for raising this.
I assume you use the CNI plugin in your environment. Unfortunately, this is a short-coming in the CNI spec and this issue isn't exclusive to
linkerd-cni
. We have some measures in place to limit the blast radius for this as much as possible, but we do not control the scheduler, or the kubelet runtime that's responsible for starting the pods. You can look at #8070 for a more detailed overview (and all of its associated issues).In the latest stable version, each injected pod (when using the CNI plugin) will receive a validator init container that will fail if the pod ran before the CNI plugin was configured. This will effectively stop your pod from roll…