Donor - Cannot assign requested address #103

Jaykah · 2014-09-02T16:08:20Z

Galera 25.3.5-wheezy(rXXXX)
Percona XtraDB 5.5.37-MariaDB-35.0
Built in Docker from https://github.com/neildunbar/mariadb55
Ubuntu trusty

Started first node in Donor mode
Connected node 2 and 3
They got in sync, everything was working fine

Until this happens to the donor:


140827 8:54:35 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.1.5:4567
140827 8:54:35 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.52987S), skipping check
140827 8:54:35 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting off
140827 8:54:35 [Note] WSREP: view(view_id(NON_PRIM,1df53171-2d4d-11e4-829e-c29c0950cb7d,3) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:35 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140827 8:54:35 [Note] WSREP: Flow-control interval: [16, 16]
140827 8:54:35 [Note] WSREP: Received NON-PRIMARY.
140827 8:54:35 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 714927)
140827 8:54:35 [Note] WSREP: view(view_id(NON_PRIM,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:35 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140827 8:54:35 [Note] WSREP: Flow-control interval: [16, 16]
140827 8:54:35 [Note] WSREP: Received NON-PRIMARY.
140827 8:54:36 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.1.11:4567 tcp://10.0.1.5:4567
140827 8:54:37 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:37 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:38 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:38 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:40 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:40 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:40 [Warning] WSREP: Send action {(nil), 360, TORDERED} returned -107 (Transport endpoint is not connected)
140827 8:54:40 [Note] WSREP: evs::proto(aaa04c84-2d4c-11e4-a1fe-c605ab19a328, GATHER, view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4)) suspecting node: 1df53171-2d4d-11e4-829e-c29c0950cb7d
140827 8:54:40 [Note] WSREP: evs::proto(aaa04c84-2d4c-11e4-a1fe-c605ab19a328, GATHER, view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4)) suspecting node: d616fec0-2d4d-11e4-b768-d28e90659024
140827 8:54:41 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:41 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:41 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting off
140827 8:54:41 [Warning] WSREP: subsequent views have same members, prev view view(view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
}) current view view(view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,5) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
})
140827 8:54:41 [Note] WSREP: view(view_id(NON_PRIM,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,5) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:41 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140827 8:54:41 [Note] WSREP: Flow-control interval: [16, 16]
140827 8:54:41 [Note] WSREP: Received NON-PRIMARY.
140827 8:54:41 [Note] WSREP: declaring 1df53171-2d4d-11e4-829e-c29c0950cb7d stable
140827 8:54:41 [Note] WSREP: declaring d616fec0-2d4d-11e4-b768-d28e90659024 stable
140827 8:54:41 [Note] WSREP: Node 1df53171-2d4d-11e4-829e-c29c0950cb7d state prim
140827 8:54:41 [Note] WSREP: view(view_id(PRIM,1df53171-2d4d-11e4-829e-c29c0950cb7d,6) memb {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
} joined {
} left {
} partitioned {
})
140827 8:54:41 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: sent state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: got state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa from 0 ()
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: got state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa from 1 (283c4a3333ec)
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: got state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa from 2 ()
140827 8:54:41 [Note] WSREP: Quorum results:
 version = 3,
 component = PRIMARY,
 conf_id = 4,
 members = 2/3 (joined/total),
 act_id = 714953,
 last_appl. = 714853,
 protocols = 0/5/2 (gcs/repl/appl),
 group UUID = 40179a77-2bf4-11e4-b46a-a30695b570ce
140827 8:54:41 [Note] WSREP: Flow-control interval: [28, 28]
140827 8:54:41 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 714953)
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140827 8:54:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140827 8:54:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140827 8:54:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:45 [Note] WSREP: State transfer required:
 Group state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714953
 Local state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714953, view# 5: Primary, number of nodes: 3, my index: 1, protocol version 2
140827 8:54:45 [Warning] WSREP: Gap in state sequence. Need state transfer.
140827 8:54:47 [Note] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
140827 8:54:47 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:47 [Note] WSREP: REPL Protocols: 5 (3, 1)
140827 8:54:47 [Note] WSREP: Service thread queue flushed.
140827 8:54:47 [Note] WSREP: Assign initial position for certification: 714953, protocol version: 3
140827 8:54:47 [Note] WSREP: Service thread queue flushed.
140827 8:54:47 [Warning] WSREP: Releasing seqno 714927 before 714928 was assigned.
140827 8:54:48 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at tcp://10.0.1.10:4568', asio error 'Cannot assign requested address': 99 (Cannot assign requested address)
  at galera/src/ist.cpp:prepare():334. IST will be unavailable.
140827 8:54:48 [Note] WSREP: Member 1.0 (283c4a3333ec) requested state transfer from '*any*'. Selected 0.0 ()(SYNCED) as donor.
140827 8:54:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 714955)
140827 8:54:48 [Note] WSREP: Requesting state transfer: success, donor: 0
140827 8:54:48 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 0 bytes
140827 8:54:48 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers
140827 8:54:48 [Warning] WSREP: 0.0 (): State transfer to 1.0 (283c4a3333ec) failed: -125 (Operation canceled)
140827 8:54:48 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():723: Will never receive state. Need to abort.
140827 8:54:48 [Note] WSREP: gcomm: terminating thread
140827 8:54:48 [Note] WSREP: gcomm: joining thread
140827 8:54:48 [Note] WSREP: gcomm: closing backend
140827 8:54:48 [Note] WSREP: view(view_id(NON_PRIM,1df53171-2d4d-11e4-829e-c29c0950cb7d,6) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:48 [Note] WSREP: view((empty))
140827 8:54:48 [Note] WSREP: gcomm: closed
140827 8:54:48 [Note] WSREP: /usr/sbin/mysqld: Terminated.

This has happened multiple times (read- happens almost every day), usually after 10 or so hours of operation.

The port 4568 is exposed and mapped (-p 4568:4568).

The text was updated successfully, but these errors were encountered:

ayurchen · 2014-09-02T16:57:14Z

It does not matter if port is "exposed". Does 10.0.1.10 exist? Can another application like nc be started to listen at that address? What does netstat say at that point?
I'd also try to figure out what is happening so that it prints
140827 8:54:35 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.52987S), skipping check

Jaykah · 2014-09-02T18:51:00Z

Thanks Alex!

Looks like you are spot-on on the address assignment. I have installed netcat and it is only able to attach with docker in --net=host mode. This is strange because none of the Docker Galera implementations that I could find on the net mention using --net=host (or exposing 4568 for that matter).

I guess I'll treat "packaged" builds with a grain of salt going forward :)

I will update this issue if there are any further developments in regards to the matter in question.

Before we close it out though - I have a question pertaining to the second point you've outlined

I'd also try to figure out what is happening so that it prints 140827 8:54:35 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.52987S), skipping check

What does it mean, and could it be related to the issue in question? If not, I would really appreciate pointers as to how to solve it, as I don't want to experience the implications of the matter prior to having an understanding of how to solve it :)

ayurchen · 2014-09-03T11:14:28Z

it means that timer thread in galera could not get CPU time for more than 1.5 seconds. In lay terms, the mysqld process was blocked - and that is what caused cluster partitioning. It is likely an IO issue.

Jaykah · 2014-09-03T18:00:44Z

I am using Azure, and I get the same message across servers, so it's unlikely a disk issue.

Can it be related to using AUFS as my Docker storage backend?

Are there any other scenarios where this error might pop up?

Once again, thank you for your help!

ayurchen · 2014-09-03T19:28:20Z

Disk or not, but the symptom is really simple: timer thread was stuck for more than 1.5 seconds. And given that other nodes discarded this node from the group means that from their perspective that node was stuck too. I guess you'll have to observe the system more closely to figure out what is going on. Never used Azure, so no idea.

bradjones1 · 2016-12-09T22:37:35Z

This is an old thread (and can probably be closed) but for anyone in a similar situation where your advertised address to other nodes differs from the bind address for the IST transfer, consider the ist.recv_bind provider option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Donor - Cannot assign requested address #103

Donor - Cannot assign requested address #103

Jaykah commented Sep 2, 2014

ayurchen commented Sep 2, 2014

Jaykah commented Sep 2, 2014

ayurchen commented Sep 3, 2014

Jaykah commented Sep 3, 2014

ayurchen commented Sep 3, 2014

bradjones1 commented Dec 9, 2016

Donor - Cannot assign requested address #103

Donor - Cannot assign requested address #103

Comments

Jaykah commented Sep 2, 2014

ayurchen commented Sep 2, 2014

Jaykah commented Sep 2, 2014

ayurchen commented Sep 3, 2014

Jaykah commented Sep 3, 2014

ayurchen commented Sep 3, 2014

bradjones1 commented Dec 9, 2016