You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's say that you have a cluster in which SSL encryption is being used for Galera replication traffic. If you try to add an arbitrator (garbd) node to the cluster, and if the arbitrator is not configured with the proper socket.ssl_* options set in wsrep_provider_options, then the arbitrator will not be able to join the cluster due to SSL errors, which makes complete sense.
Unfortunately, when this happens, the arbitrator's log does not make it clear that it failed to connect due to SSL errors. Instead, the log suggests that the arbitrator node actually timed out.
For example, here is the error log from an arbitrator node that does not have SSL enabled that belongs to a cluster that does use SSL:
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: Starting garbd
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.156 INFO: CRC-32C: using hardware acceleration.
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.156 INFO: Read config:
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: daemon: 0
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: name: garb
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: address: gcomm://172.30.0.249:4567,172.30.0.46:4567,172.30.0.32:4567
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: group: my_wsrep_cluster
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: sst: trivial
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: donor:
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: cfg:
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: log:
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.157 INFO: protonet asio version 0
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.157 INFO: Using CRC-32C for message checksums.
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 INFO: backend: asio
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 INFO: gcomm thread scheduling priority set to other:0
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 WARN: access file(./gvwstate.dat) failed(No such file or directory)
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 INFO: restore pc from disk failed
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 INFO: GMCast version 0
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.158 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.159 INFO: EVS version 0
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.159 INFO: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.0.249:4567,172.30.0.46:4567,172.30.0.32:4567'
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.160 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection established to fbff02f5 tcp://172.30.0.32:4567
Jan 2 16:00:38 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:38.160 WARN: (fbff02f5, 'tcp://0.0.0.0:4567') address 'tcp://172.30.0.32:4567' points to own listening address, blacklisting
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:41.160 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.249:4567 timed out, no messages seen in PT3S
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:41.160 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer fbff02f5 with addr tcp://172.30.0.32:4567 timed out, no messages seen in PT3S
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:41.160 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.46:4567 timed out, no messages seen in PT3S
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:41.160 WARN: no nodes coming from prim view, prim not possible
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:41.160 INFO: view(view_id(NON_PRIM,fbff02f5,1) memb {
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: fbff02f5,0
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: } joined {
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: } left {
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: } partitioned {
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: })
Jan 2 16:00:41 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:41.660 WARN: last inactive check more than PT1.5S ago (PT3.50168S), skipping check
Jan 2 16:00:45 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:45.161 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.249:4567 timed out, no messages seen in PT3S
Jan 2 16:00:48 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:48.161 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.46:4567 timed out, no messages seen in PT3S
Jan 2 16:00:51 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:51.162 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.249:4567 timed out, no messages seen in PT3S
Jan 2 16:00:54 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:54.162 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.46:4567 timed out, no messages seen in PT3S
Jan 2 16:00:57 ip-172-30-0-32 garb-systemd: 2018-01-02 16:00:57.163 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.249:4567 timed out, no messages seen in PT3S
Jan 2 16:01:00 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:00.164 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.46:4567 timed out, no messages seen in PT3S
Jan 2 16:01:03 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:03.164 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.249:4567 timed out, no messages seen in PT3S
Jan 2 16:01:06 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:06.165 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.46:4567 timed out, no messages seen in PT3S
Jan 2 16:01:09 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:09.165 INFO: (fbff02f5, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.30.0.249:4567 timed out, no messages seen in PT3S
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:11.169 INFO: view((empty))
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:11.169 ERROR: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: at gcomm/src/pc.cpp:connect():158
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:11.169 ERROR: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
Jan 2 16:01:11 ip-172-30-0-32 systemd: garb.service: main process exited, code=exited, status=1/FAILURE
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:11.170 ERROR: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://172.30.0.249:4567,172.30.0.46:4567,172.30.0.32:4567': -110 (Connection timed out)
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: 2018-01-02 16:01:11.170 FATAL: Exception in creating receive loop: Failed to open connection to group: 110 (Connection timed out)
Jan 2 16:01:11 ip-172-30-0-32 garb-systemd: at garb/garb_gcs.cpp:Gcs():35
Jan 2 16:01:11 ip-172-30-0-32 systemd: Unit garb.service entered failed state.
Jan 2 16:01:11 ip-172-30-0-32 systemd: garb.service failed.
The messages in the error log appear to be incorrect to me. They should probably report that an SSL error occurred, rather than a connection timeout.
Since the arbitrator's log doesn't really make much sense in this instance, the user might think to check the error logs from the cluster nodes to get a hint of what is happening. Unfortunately, the error logs from the cluster nodes (running MariaDB 10.1) usually do not appear to contain any relevant log messages either when SSL-related errors occur. However, if the error was specifically caused by an incompatible cipher choice, then the cluster node's log might contain an error message like the following:
Please fix this issue. I'm trying to use a more secure cipher due to a vulnerability scan complaining about unsecure ciphers, but whenever I change it on the arbitrator, I receive the exact same error message as you do.
Let's say that you have a cluster in which SSL encryption is being used for Galera replication traffic. If you try to add an arbitrator (garbd) node to the cluster, and if the arbitrator is not configured with the proper socket.ssl_* options set in wsrep_provider_options, then the arbitrator will not be able to join the cluster due to SSL errors, which makes complete sense.
Unfortunately, when this happens, the arbitrator's log does not make it clear that it failed to connect due to SSL errors. Instead, the log suggests that the arbitrator node actually timed out.
For example, here is the error log from an arbitrator node that does not have SSL enabled that belongs to a cluster that does use SSL:
The messages in the error log appear to be incorrect to me. They should probably report that an SSL error occurred, rather than a connection timeout.
Since the arbitrator's log doesn't really make much sense in this instance, the user might think to check the error logs from the cluster nodes to get a hint of what is happening. Unfortunately, the error logs from the cluster nodes (running MariaDB 10.1) usually do not appear to contain any relevant log messages either when SSL-related errors occur. However, if the error was specifically caused by an incompatible cipher choice, then the cluster node's log might contain an error message like the following:
The text was updated successfully, but these errors were encountered: