Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Donor - Cannot assign requested address #103

Open
Jaykah opened this issue Sep 2, 2014 · 6 comments
Open

Donor - Cannot assign requested address #103

Jaykah opened this issue Sep 2, 2014 · 6 comments

Comments

@Jaykah
Copy link

Jaykah commented Sep 2, 2014

Galera 25.3.5-wheezy(rXXXX)
Percona XtraDB 5.5.37-MariaDB-35.0
Built in Docker from https://github.com/neildunbar/mariadb55
Ubuntu trusty

  1. Started first node in Donor mode
  2. Connected node 2 and 3
  3. They got in sync, everything was working fine

Until this happens to the donor:


140827 8:54:35 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.1.5:4567
140827 8:54:35 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.52987S), skipping check
140827 8:54:35 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting off
140827 8:54:35 [Note] WSREP: view(view_id(NON_PRIM,1df53171-2d4d-11e4-829e-c29c0950cb7d,3) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:35 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140827 8:54:35 [Note] WSREP: Flow-control interval: [16, 16]
140827 8:54:35 [Note] WSREP: Received NON-PRIMARY.
140827 8:54:35 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 714927)
140827 8:54:35 [Note] WSREP: view(view_id(NON_PRIM,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:35 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140827 8:54:35 [Note] WSREP: Flow-control interval: [16, 16]
140827 8:54:35 [Note] WSREP: Received NON-PRIMARY.
140827 8:54:36 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.1.11:4567 tcp://10.0.1.5:4567
140827 8:54:37 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:37 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:38 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:38 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:40 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:40 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:40 [Warning] WSREP: Send action {(nil), 360, TORDERED} returned -107 (Transport endpoint is not connected)
140827 8:54:40 [Note] WSREP: evs::proto(aaa04c84-2d4c-11e4-a1fe-c605ab19a328, GATHER, view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4)) suspecting node: 1df53171-2d4d-11e4-829e-c29c0950cb7d
140827 8:54:40 [Note] WSREP: evs::proto(aaa04c84-2d4c-11e4-a1fe-c605ab19a328, GATHER, view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4)) suspecting node: d616fec0-2d4d-11e4-b768-d28e90659024
140827 8:54:41 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to d616fec0-2d4d-11e4-b768-d28e90659024 (tcp://10.0.1.11:4567), attempt 0
140827 8:54:41 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') reconnecting to 1df53171-2d4d-11e4-829e-c29c0950cb7d (tcp://10.0.1.5:4567), attempt 0
140827 8:54:41 [Note] WSREP: (aaa04c84-2d4c-11e4-a1fe-c605ab19a328, 'tcp://0.0.0.0:4567') turning message relay requesting off
140827 8:54:41 [Warning] WSREP: subsequent views have same members, prev view view(view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,4) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
}) current view view(view_id(REG,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,5) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
})
140827 8:54:41 [Note] WSREP: view(view_id(NON_PRIM,aaa04c84-2d4c-11e4-a1fe-c605ab19a328,5) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:41 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140827 8:54:41 [Note] WSREP: Flow-control interval: [16, 16]
140827 8:54:41 [Note] WSREP: Received NON-PRIMARY.
140827 8:54:41 [Note] WSREP: declaring 1df53171-2d4d-11e4-829e-c29c0950cb7d stable
140827 8:54:41 [Note] WSREP: declaring d616fec0-2d4d-11e4-b768-d28e90659024 stable
140827 8:54:41 [Note] WSREP: Node 1df53171-2d4d-11e4-829e-c29c0950cb7d state prim
140827 8:54:41 [Note] WSREP: view(view_id(PRIM,1df53171-2d4d-11e4-829e-c29c0950cb7d,6) memb {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
} joined {
} left {
} partitioned {
})
140827 8:54:41 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: sent state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: got state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa from 0 ()
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: got state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa from 1 (283c4a3333ec)
140827 8:54:41 [Note] WSREP: STATE EXCHANGE: got state msg: c87d2c4a-2dc7-11e4-97a4-16ee176669aa from 2 ()
140827 8:54:41 [Note] WSREP: Quorum results:
 version = 3,
 component = PRIMARY,
 conf_id = 4,
 members = 2/3 (joined/total),
 act_id = 714953,
 last_appl. = 714853,
 protocols = 0/5/2 (gcs/repl/appl),
 group UUID = 40179a77-2bf4-11e4-b46a-a30695b570ce
140827 8:54:41 [Note] WSREP: Flow-control interval: [28, 28]
140827 8:54:41 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 714953)
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140827 8:54:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140827 8:54:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140827 8:54:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:45 [Note] WSREP: State transfer required:
 Group state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714953
 Local state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714927
140827 8:54:45 [Note] WSREP: New cluster view: global state: 40179a77-2bf4-11e4-b46a-a30695b570ce:714953, view# 5: Primary, number of nodes: 3, my index: 1, protocol version 2
140827 8:54:45 [Warning] WSREP: Gap in state sequence. Need state transfer.
140827 8:54:47 [Note] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
140827 8:54:47 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140827 8:54:47 [Note] WSREP: REPL Protocols: 5 (3, 1)
140827 8:54:47 [Note] WSREP: Service thread queue flushed.
140827 8:54:47 [Note] WSREP: Assign initial position for certification: 714953, protocol version: 3
140827 8:54:47 [Note] WSREP: Service thread queue flushed.
140827 8:54:47 [Warning] WSREP: Releasing seqno 714927 before 714928 was assigned.
140827 8:54:48 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at tcp://10.0.1.10:4568', asio error 'Cannot assign requested address': 99 (Cannot assign requested address)
  at galera/src/ist.cpp:prepare():334. IST will be unavailable.
140827 8:54:48 [Note] WSREP: Member 1.0 (283c4a3333ec) requested state transfer from '*any*'. Selected 0.0 ()(SYNCED) as donor.
140827 8:54:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 714955)
140827 8:54:48 [Note] WSREP: Requesting state transfer: success, donor: 0
140827 8:54:48 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 0 bytes
140827 8:54:48 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers
140827 8:54:48 [Warning] WSREP: 0.0 (): State transfer to 1.0 (283c4a3333ec) failed: -125 (Operation canceled)
140827 8:54:48 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():723: Will never receive state. Need to abort.
140827 8:54:48 [Note] WSREP: gcomm: terminating thread
140827 8:54:48 [Note] WSREP: gcomm: joining thread
140827 8:54:48 [Note] WSREP: gcomm: closing backend
140827 8:54:48 [Note] WSREP: view(view_id(NON_PRIM,1df53171-2d4d-11e4-829e-c29c0950cb7d,6) memb {
 aaa04c84-2d4c-11e4-a1fe-c605ab19a328,0
} joined {
} left {
} partitioned {
 1df53171-2d4d-11e4-829e-c29c0950cb7d,0
 d616fec0-2d4d-11e4-b768-d28e90659024,0
})
140827 8:54:48 [Note] WSREP: view((empty))
140827 8:54:48 [Note] WSREP: gcomm: closed
140827 8:54:48 [Note] WSREP: /usr/sbin/mysqld: Terminated.

This has happened multiple times (read- happens almost every day), usually after 10 or so hours of operation.

The port 4568 is exposed and mapped (-p 4568:4568).

@ayurchen
Copy link
Member

ayurchen commented Sep 2, 2014

  1. It does not matter if port is "exposed". Does 10.0.1.10 exist? Can another application like nc be started to listen at that address? What does netstat say at that point?
  2. I'd also try to figure out what is happening so that it prints
    140827 8:54:35 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.52987S), skipping check

@Jaykah
Copy link
Author

Jaykah commented Sep 2, 2014

Thanks Alex!

Looks like you are spot-on on the address assignment. I have installed netcat and it is only able to attach with docker in --net=host mode. This is strange because none of the Docker Galera implementations that I could find on the net mention using --net=host (or exposing 4568 for that matter).

I guess I'll treat "packaged" builds with a grain of salt going forward :)

I will update this issue if there are any further developments in regards to the matter in question.

Before we close it out though - I have a question pertaining to the second point you've outlined

I'd also try to figure out what is happening so that it prints 140827 8:54:35 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.52987S), skipping check

What does it mean, and could it be related to the issue in question? If not, I would really appreciate pointers as to how to solve it, as I don't want to experience the implications of the matter prior to having an understanding of how to solve it :)

@ayurchen
Copy link
Member

ayurchen commented Sep 3, 2014

it means that timer thread in galera could not get CPU time for more than 1.5 seconds. In lay terms, the mysqld process was blocked - and that is what caused cluster partitioning. It is likely an IO issue.

@Jaykah
Copy link
Author

Jaykah commented Sep 3, 2014

I am using Azure, and I get the same message across servers, so it's unlikely a disk issue.

Can it be related to using AUFS as my Docker storage backend?

Are there any other scenarios where this error might pop up?

Once again, thank you for your help!

@ayurchen
Copy link
Member

ayurchen commented Sep 3, 2014

Disk or not, but the symptom is really simple: timer thread was stuck for more than 1.5 seconds. And given that other nodes discarded this node from the group means that from their perspective that node was stuck too. I guess you'll have to observe the system more closely to figure out what is going on. Never used Azure, so no idea.

@bradjones1
Copy link

This is an old thread (and can probably be closed) but for anyone in a similar situation where your advertised address to other nodes differs from the bind address for the IST transfer, consider the ist.recv_bind provider option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants