Scalability issues in 2.13.x #11129
Replies: 6 comments 1 reply
-
Can you please provide all the logs for that Destination pod, with the containers log levels set to |
Beta Was this translation helpful? Give feedback.
-
I don't see any clear signs in the logs of an error that would cause the containers to crash. Could you share the |
Beta Was this translation helpful? Give feedback.
-
here's a describe of one of the destination pods. I've attached files for all events so you can see the order of events, and the individual pod describes.
linkerd-destination-6c9b766689-qx9fk_describe.txt |
Beta Was this translation helpful? Give feedback.
-
Just following up here. We haven't been able to repro this issue yet. We've fixed something similar in #11135 but since there are no panics here it's not clear whether it's the same underlying issue. We've also fixed #11162 and #11055 but not obvious that these are related. We're going to 2.13.6 later this week and it would be great if you could try with that release; at a minimum it should reduce some of the log noise. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the update @wmorgan! #11055 sounds somewhat promising. I'll test the upgrade again once 2.13.6 is released and report back. |
Beta Was this translation helpful? Give feedback.
-
I would like to report that we have been successfully running version v2.13.6 for over 12 hours and the issues with the Thank you to the Linkerd team for investigating the issue and getting the fix deployed. |
Beta Was this translation helpful? Give feedback.
-
Overview
Upgrading linkerd from 2.12.5 to 2.13.5 results in the
destination
pods entering a crashloop on startup. The logs show that the destination container's gRPC server fails to start and there is a flood of proxy errors which eventually clear when running a smaller set of pods in the mesh.The cluster consists of ~2300 meshed pods. In terms of workarounds to these errors two options have worked:
linkerd.io/inject:"disabled"
on a set of deploymentsAnother point of interest is that on 2.12.5 the
linkerd-proxy
container used on average ~200MiB memory. Since upgrading to 2.13.5 thelinkerd-proxy
container now consistently uses an average of 600MiB memory.Attempted fixes
Versions
Check command
Destination pod logs snippet (all containers)
Destination container only logs
Beta Was this translation helpful? Give feedback.
All reactions