Occasional control plane health check failures #11135
-
This discussion started at the Buoyant forum here. While considering the recommendation in that post to revert to the default 3 replicas for the control plane, I noticed that our destination pods have occasional restarts. (Roughly once per day or every other day.) I found some linkerd logs which happen at the same time as some of the health check (liveness) failures for the pods. I'm not sure how to interpret them, but am following @wmorgan 's suggestion to create an issue here. Here are some of the logs from one of the destination pods that failed a liveness check. I'm not sure if or how they're related, though. Also, there are tons more logs. Not sure how to find interesting ones, so I'm just grabbing a few for now. It's probably worth noting that other than when the pods fail health checks, they don't really produce any logs. (Well, that's besides a lot of logs which contain We're running on 2.13.5. What can I do to further investigate or mitigate these restarts?
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Here's another hunk of logs from a different time that look interesting as well.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed report @nealharris. This indeed uncovers a race condition that I'm tracking in #11163. When we fix this in the |
Beta Was this translation helpful? Give feedback.
Thanks for the detailed report @nealharris. This indeed uncovers a race condition that I'm tracking in #11163. When we fix this in the
main
branch it should be a good candidate to get back-ported into 2.13.