You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a 3 node pxc cluster with ProxySQL in front where all writes go to one node all managed by severalnines cluster control.
But we have had multiple complete downtimes due to one of the pxc nodes hanging.
Scenario 1: Hanging non write node
The underlying hardware cause all writes to the binlog to hang.
After a while the pxc node can't complete commits
Later the whole cluster stops due to flow control
evs.auto_evict Does nothing since the node still responds to network activity
Possible solution: Allow a node to be evicted if it falls way to behind on writes
Scenario 2: Hanging write node
The underlying hardware cause all writes to the binlog to hang.
After a while the pxc node can't complete commits
The ProxySQL galera checker script report all ok so traffic is not moved to new node
Possible solution: Allow a node to self evict if it is unable to perform commits
How to reproduce:
Install a regular galera cluster with 3 nodes.
Make sure the binlog is located on a separate partition
Keep a steady stream of write queries to one of the nodes.
Run "fsfreeze -f /mount/for/binlog/"
Wait for the cluster to stop serving queries.
Workaround:
We have currently deployed a workaround where a script regularly try to update a file on /mount/for/binlog and if it takes more than X seconds to complete block all network traffic to other galera nodes. That way the node drops out of the cluster and proxysql can find a new node to write to
The text was updated successfully, but these errors were encountered:
We have a 3 node pxc cluster with ProxySQL in front where all writes go to one node all managed by severalnines cluster control.
But we have had multiple complete downtimes due to one of the pxc nodes hanging.
Scenario 1: Hanging non write node
The underlying hardware cause all writes to the binlog to hang.
After a while the pxc node can't complete commits
Later the whole cluster stops due to flow control
evs.auto_evict Does nothing since the node still responds to network activity
Possible solution: Allow a node to be evicted if it falls way to behind on writes
Scenario 2: Hanging write node
The underlying hardware cause all writes to the binlog to hang.
After a while the pxc node can't complete commits
The ProxySQL galera checker script report all ok so traffic is not moved to new node
Possible solution: Allow a node to self evict if it is unable to perform commits
How to reproduce:
Install a regular galera cluster with 3 nodes.
Make sure the binlog is located on a separate partition
Keep a steady stream of write queries to one of the nodes.
Run "fsfreeze -f /mount/for/binlog/"
Wait for the cluster to stop serving queries.
Workaround:
We have currently deployed a workaround where a script regularly try to update a file on /mount/for/binlog and if it takes more than X seconds to complete block all network traffic to other galera nodes. That way the node drops out of the cluster and proxysql can find a new node to write to
The text was updated successfully, but these errors were encountered: