Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IST is not interrupted properly during shutdown #142

Open
temeo opened this issue Sep 24, 2014 · 1 comment
Open

IST is not interrupted properly during shutdown #142

temeo opened this issue Sep 24, 2014 · 1 comment
Assignees

Comments

@temeo
Copy link
Contributor

temeo commented Sep 24, 2014

Happens every now and then in #71 testing.

Copy from: https://bugs.launchpad.net/galera/+bug/1176852

People report:

May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: view((empty))
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: gcomm: closed
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Flow-control interval: [64, 64]
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Received NON-PRIMARY.
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Shifting JOINER -> OPEN (TO: 4095760)
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Received self-leave message.
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Flow-control interval: [64, 64]
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Received SELF-LEAVE. Closing connection.
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 4095760)
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: RECV thread exiting 0: Success
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: recv_thread() joined.
May 3 00:46:08 localhost mysqld: 130503 0:46:08 [Note] WSREP: Closing slave action queue.
May 3 02:19:11 localhost mysqld: 130503 2:19:11 [Note] WSREP: IST received: 92f5a26e-aeaf-11e2-0800-fcc64d722c65:4093185
May 3 02:19:11 localhost mysqld: 130503 2:19:11 [ERROR] WSREP: gcs/src/gcs.c:_join():800: Sending JOIN failed: -103 (Software caused connection abort).
May 3 02:19:11 localhost mysqld: 130503 2:19:11 [ERROR] WSREP: Failed to JOIN the cluster after SST
May 3 02:19:12 localhost mysqld: 130503 2:19:12 [Warning] WSREP: Failed to report last committed 4094167, -77 (File descriptor in bad state)
May 3 02:19:14 localhost mysqld: 130503 2:19:14 [Warning] WSREP: Failed to report last committed 4095153, -77 (File descriptor in bad state)
  • the node won't shut down until IST is over - in this case 1.5 hours. IST should be easily interruptible and the node should be able to shutdown nicely.
@temeo temeo added the bug label Sep 24, 2014
@temeo temeo self-assigned this Sep 24, 2014
@temeo
Copy link
Contributor Author

temeo commented Oct 4, 2014

This seems to require gcs/replicator interface refactoring. The issue here is that while applier threads are handling IST events, there is no thread that will handle the leave events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant