You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Errors/Incorrect Behaviour Encountered Maximum stable Cluster Size is the number of DNS results returned.
Description of issue
What are the expected results?
DNS query, I would not expect nodes to be removed if not in the DNS response. I would expect to trust the disconnect if a node times out with net_ticktime and is not actively being removed. For example, if you have 15 nodes and DNS replies with 5 random node IPs, the cluster will become unstable.
Is the documentation incorrect?
Documentation does not mention that nodes will be removed when no longer in DNS. It just says:
this strategy will periodically poll DNS and connect all nodes it finds.
Should we introduce a config flag to turn off removing nodes?
The text was updated successfully, but these errors were encountered:
I'd be open to accepting a PR that makes removing nodes in this strategy optional based on a flag, something like prune: false to disable pruning the node list. I believe there was a reason we actively prune nodes when the source of data for the strategy (e.g. DNS in this case, but could be any system providing service discovery) no longer reports a node as being part of the cluster, but I can't recall the specifics at the moment, but it was a specific choice. libcluster is largely deferring to the source registry to tell us what nodes belong in the cluster. In the case of DNS, it is unusual for a node to disappear from DNS unless it is being permanently removed, but I can imagine scenarios where this might happen, such as under k8s or some other orchestrator that uses DNS for service discovery.
Steps to reproduce
Cluster.Strategy.DNSPoll
Maximum stable Cluster Size is the number of DNS results returned.
Description of issue
What are the expected results?
DNS query, I would not expect nodes to be removed if not in the DNS response. I would expect to trust the disconnect if a node times out with
net_ticktime
and is not actively being removed. For example, if you have 15 nodes and DNS replies with 5 random node IPs, the cluster will become unstable.Is the documentation incorrect?
Documentation does not mention that nodes will be removed when no longer in DNS. It just says:
Should we introduce a config flag to turn off removing nodes?
The text was updated successfully, but these errors were encountered: