You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature Request: Allow agents in client mode to permanently leave the cluster when leave_on_terminate is enabled
Proposal
We suggest adding an option for agents in client mode to fully leave the cluster when leave_on_terminate is set to true. Currently, when a client node is terminated, it is marked as down but stays in the cluster. Instead, it should be marked as left and then removed from the node pool.
Use Cases
Right now, there’s no way to tell the difference between a node that’s down due to an error and one that’s down because it left the cluster. This can cause problems like:
Monitoring Issues: False alerts might be triggered because monitoring systems can’t distinguish between failed nodes and nodes that were intentionally terminated.
Manual Cleanup: Admins have to run system gc to remove nodes, solving the false alerts that are being triggered, which adds unnecessary overhead.
This feature would be helpful in situations like:
down_scaling/auto_scaling: When reducing the number of client nodes or using auto_scaling, terminated nodes should cleanly exit the cluster.
Immutable Infrastructure: In setups where instances are frequently recreated, nodes should leave the cluster cleanly when terminated to avoid stale entries.
Proposed Solution
We propose adding a configuration option that lets agents in client mode fully leave the cluster when terminated. This would:
Mark the node as left when the agent is terminated.
Automatically remove the node from the node pool after a configurable time.
This change would make cluster management easier, reduce manual work, and improve monitoring accuracy.
Thanks for considering this request!
The text was updated successfully, but these errors were encountered:
Thanks for opening this issue. I took a look through the code, and can confirm what you are seeing. When the client receives a shutdown signal, it exits without notifying the server. There is an existing drain_on_shutdown configuration that drains the node.
Looking through the code, I think this is a reasonable request, but may take a good bit of work to implement, so I'll mark it for roadmapping.
In the meantime, you can also use the node_gc_threshold configuration to help with some of the manual gc'ing.. Although I do realize this won't help with alerts.
Feature Request: Allow agents in client mode to permanently leave the cluster when
leave_on_terminate
is enabledProposal
We suggest adding an option for agents in client mode to fully leave the cluster when
leave_on_terminate
is set to true. Currently, when a client node is terminated, it is marked as down but stays in the cluster. Instead, it should be marked as left and then removed from the node pool.Use Cases
Right now, there’s no way to tell the difference between a node that’s down due to an error and one that’s down because it left the cluster. This can cause problems like:
system gc
to remove nodes, solving the false alerts that are being triggered, which adds unnecessary overhead.This feature would be helpful in situations like:
Proposed Solution
We propose adding a configuration option that lets agents in client mode fully leave the cluster when terminated. This would:
This change would make cluster management easier, reduce manual work, and improve monitoring accuracy.
Thanks for considering this request!
The text was updated successfully, but these errors were encountered: