Timing issues when a cluster restarts #10859

xolott · 2023-05-05T05:54:49Z

xolott
May 5, 2023

We have a small budget for infrastructure, and for our dev environment, we shut down the cluster when we don't need it, usually after workhours.

We are testing linkerd before implementing the whole thing on production.

When we start the cluster some pods start before linkerd, and they end up outside the mesh. The only solution so far is manually killing the pod, and when it's recreated, the proxy is injected.

If there any official way to solve this issue? Or the init container approach is the way to go?

Answered by mateiidavid

May 5, 2023

@xolott hey, thanks for raising this.

I assume you use the CNI plugin in your environment. Unfortunately, this is a short-coming in the CNI spec and this issue isn't exclusive to linkerd-cni. We have some measures in place to limit the blast radius for this as much as possible, but we do not control the scheduler, or the kubelet runtime that's responsible for starting the pods. You can look at #8070 for a more detailed overview (and all of its associated issues).

In the latest stable version, each injected pod (when using the CNI plugin) will receive a validator init container that will fail if the pod ran before the CNI plugin was configured. This will effectively stop your pod from roll…

View full answer

mateiidavid · 2023-05-05T09:27:47Z

mateiidavid
May 5, 2023

@xolott hey, thanks for raising this.

I assume you use the CNI plugin in your environment. Unfortunately, this is a short-coming in the CNI spec and this issue isn't exclusive to linkerd-cni. We have some measures in place to limit the blast radius for this as much as possible, but we do not control the scheduler, or the kubelet runtime that's responsible for starting the pods. You can look at #8070 for a more detailed overview (and all of its associated issues).

In the latest stable version, each injected pod (when using the CNI plugin) will receive a validator init container that will fail if the pod ran before the CNI plugin was configured. This will effectively stop your pod from rolling out. You can write a controller to detect these init container failures if you want an automated way to restart them.

Otherwise, if you don't mind granting additional permissions, the init container that uses iptables might be a better choice.

1 reply

xolott May 22, 2023
Author

Thanks @mateiidavid I will think about this. But thanks for the explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing issues when a cluster restarts #10859

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Timing issues when a cluster restarts #10859

xolott May 5, 2023

Replies: 1 comment · 1 reply

mateiidavid May 5, 2023

xolott May 22, 2023 Author

xolott
May 5, 2023

Replies: 1 comment 1 reply

mateiidavid
May 5, 2023

xolott May 22, 2023
Author