Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

galera stuck in cleanup after exception from gcomm #45

Open
temeo opened this issue May 30, 2014 · 1 comment
Open

galera stuck in cleanup after exception from gcomm #45

temeo opened this issue May 30, 2014 · 1 comment
Labels

Comments

@temeo
Copy link
Contributor

temeo commented May 30, 2014

Maybe related to already closed #38.

2014-05-30 16:05:51 27259 [Note] WSREP: Node 10d5d6f4 state prim
2014-05-30 16:05:51 27259 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
...
2014-05-30 16:05:51 27259 [ERROR] WSREP: exception from gcomm, backend must be restarted: 14c8c47a last prims not consistent (FATAL)
         at gcomm/src/pc_proto.cpp:is_prim():787
         at gcomm/src/pc_proto.cpp:handle_msg():1402
         at gcomm/src/evs_proto.cpp:handle_gap():3299
         at gcomm/src/evs_proto.cpp:handle_msg():2127
2014-05-30 16:05:51 27259 [Note] WSREP: Received self-leave message.
2014-05-30 16:05:51 27259 [Note] WSREP: Flow-control interval: [0, 0]
2014-05-30 16:05:51 27259 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-05-30 16:05:51 27259 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0)
2014-05-30 16:05:51 27259 [Note] WSREP: RECV thread exiting 0: Success
2014-05-30 16:05:51 27259 [Note] WSREP: New cluster view: global state: 10d6e99c-e7fe-11e3-9562-9795814c7421:2182875, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version -1
2014-05-30 16:05:51 27259 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-05-30 16:05:51 27259 [Note] WSREP: Closing send monitor...
2014-05-30 16:05:51 27259 [Note] WSREP: Closed send monitor.
2014-05-30 16:05:51 27259 [Note] WSREP: recv_thread() joined.
2014-05-30 16:05:51 27259 [Note] WSREP: Closing replication queue.
2014-05-30 16:05:51 27259 [Note] WSREP: Closing slave action queue.
2014-05-30 16:05:51 27259 [Note] WSREP: applier thread exiting (code:0)

Three threads were remaining:

Thread 3 (Thread 0x7f1017112700 (LWP 27264)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f1015cf6e8d in gu::Lock::wait (this=<optimized out>, cond=...)
    at galerautils/src/gu_lock.hpp:56
#2  0x00007f1015de0962 in galera::ServiceThd::thd_func (arg=0x2c2d260)
    at galera/src/galera_service_thd.cpp:30
#3  0x00007f1017cece9a in start_thread (arg=0x7f1017112700)
    at pthread_create.c:308
#4  0x00007f10172073fd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f1006ffc700 (LWP 27289)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000005d5133 in inline_mysql_cond_wait (src_line=403, 
    src_file=0xb6e838 "/home/vagrant/codership-mysql/sql/wsrep_thd.cc", 
    mutex=<optimized out>, that=<optimized out>)
    at /home/vagrant/codership-mysql/include/mysql/psi/mysql_thread.h:1162
#2  wsrep_rollback_process (thd=0x7f0ff8000990)
    at /home/vagrant/codership-mysql/sql/wsrep_thd.cc:403
#3  0x00000000005bd137 in start_wsrep_THD (arg=0x5d4c70)
    at /home/vagrant/codership-mysql/sql/mysqld.cc:5350
#4  0x00007f1017cece9a in start_thread (arg=0x7f1006ffc700)
    at pthread_create.c:308
#5  0x00007f10172073fd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f10191aa740 (LWP 27259)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000005ce5db in inline_mysql_cond_wait (mutex=<optimized out>, 
    that=<optimized out>, src_file=<optimized out>, src_line=<optimized out>)
    at /home/vagrant/codership-mysql/include/mysql/psi/mysql_thread.h:1162
#2  inline_mysql_cond_wait (src_line=199, mutex=<optimized out>, 
    that=<optimized out>, src_file=<optimized out>)
    at /home/vagrant/codership-mysql/sql/wsrep_sst.cc:193
#3  wsrep_sst_wait () at /home/vagrant/codership-mysql/sql/wsrep_sst.cc:199
#4  0x00000000005c9805 in wsrep_init_startup (first=true)
    at /home/vagrant/codership-mysql/sql/wsrep_mysqld.cc:699
#5  0x00000000005c1636 in init_server_components ()
    at /home/vagrant/codership-mysql/sql/mysqld.cc:4946
#6  0x00000000005c2235 in mysqld_main (argc=36, argv=0x2baa3d8)
    at /home/vagrant/codership-mysql/sql/mysqld.cc:6063
#7  0x00007f101713476d in __libc_start_main (
    main=0x5a14b0 <main(int, char**)>, argc=15, ubp_av=0x7fffad9c8cd8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffad9c8cc8) at libc-start.c:226
#8  0x00000000005b4a5d in _start ()

However, it is a bit unclear if this is Galera or MySQL side issue.

@temeo temeo added the bug label Jun 4, 2014
@ayurchen
Copy link
Member

ayurchen commented Jun 9, 2014

Service thread 3 stuck in lock_wait is understandable - provider was not unloaded, so destructor for the thread was not called. Who is to unload provider in that case is somewhat unclear. It looks like one of the exiting slave threads should, and in this case there must have been only the initial one:

2014-05-30 16:05:51 27259 [Note] WSREP: applier thread exiting (code:0).

As well as it should signal the initialization thread that is waiting for sst to complete.
So there seems to be a combination of two issues:

  1. slave thread exited with code 0 which probably does not prompt it to do any cleanup - this is Galera one.
  2. there likely is no cleanup implemented for this case (initialization stage) - this is MySQL one.

temeo pushed a commit that referenced this issue Jul 3, 2017
GAL-488 Backporting fixes to gcs send monitor from GCF-1033
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants