Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evict slow/hanged node that still responds to network #510

Open
MagnusKlingenberg opened this issue Jun 27, 2018 · 0 comments
Open

Evict slow/hanged node that still responds to network #510

MagnusKlingenberg opened this issue Jun 27, 2018 · 0 comments

Comments

@MagnusKlingenberg
Copy link

We have a 3 node pxc cluster with ProxySQL in front where all writes go to one node all managed by severalnines cluster control.

But we have had multiple complete downtimes due to one of the pxc nodes hanging.

Scenario 1: Hanging non write node
The underlying hardware cause all writes to the binlog to hang.
After a while the pxc node can't complete commits
Later the whole cluster stops due to flow control
evs.auto_evict Does nothing since the node still responds to network activity

Possible solution: Allow a node to be evicted if it falls way to behind on writes

Scenario 2: Hanging write node
The underlying hardware cause all writes to the binlog to hang.
After a while the pxc node can't complete commits
The ProxySQL galera checker script report all ok so traffic is not moved to new node

Possible solution: Allow a node to self evict if it is unable to perform commits

How to reproduce:
Install a regular galera cluster with 3 nodes.
Make sure the binlog is located on a separate partition
Keep a steady stream of write queries to one of the nodes.
Run "fsfreeze -f /mount/for/binlog/"
Wait for the cluster to stop serving queries.

Workaround:
We have currently deployed a workaround where a script regularly try to update a file on /mount/for/binlog and if it takes more than X seconds to complete block all network traffic to other galera nodes. That way the node drops out of the cluster and proxysql can find a new node to write to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant