Description
Description of the problem
During GCPC, one of our judgehosts got stuck in a deadlock and stopped judging submissions.
Your environment
DOMjudge version: b32fc5c8c5a84160e6e26203228fbbe1ec8444e9
with 028995f9c00e7897ec863283986ef995661e38b9
cherry-picked on top of that
Operating system / Linux distribution and version: Debian GNU/Linux 12 (bookworm), 6.1.0-27-amd64
Webserver: nginx/1.22.1
Steps to reproduce
We don't have a reproducer. Probably timing-dependent
Expected behaviour
Judgehost should judge submission and continue judging afterwards.
Actual behaviour
Judgehost stops judging and the web interface shows a warning.
Any other information that you want to share?
Here's a stacktrace from the runguard process:
(gdb) bt
#0 0x00007fa4dc0a50d6 in __lll_lock_wait_private () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fa4dc07da45 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fa4dc135a7f in __fprintf_chk () from target:/lib/x86_64-linux-gnu/libc.so.6
#3 0x000055cc552c6621 in fprintf (__fmt=0x55cc552cb004 "%s: warning: ", __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:79
#4 warning (format=format@entry=0x55cc552cbc08 "timelimit exceeded (hard wall time): aborting command") at runguard.cc:203
#5 0x000055cc552c6d8d in terminate (sig=14) at runguard.cc:693
#6 <signal handler called>
#7 0x00007fa4dc07d9ff in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#8 0x00007fa4dc135a7f in __fprintf_chk () from target:/lib/x86_64-linux-gnu/libc.so.6
#9 0x000055cc552c6621 in fprintf (__fmt=0x55cc552cb004 "%s: warning: ", __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:79
#10 warning (format=format@entry=0x55cc552ccf28 "timelimit exceeded (hard cpu time)") at runguard.cc:203
#11 0x000055cc552c62c1 in main (argc=<optimized out>, argv=<optimized out>) at runguard.cc:1517
It seems like the signal handler terminate()
for SIGALRM
called fprintf()
(via warning()
) at the same time as the main function. fprintf seems to use locking, and since the signal handler blocks the execution of main, the lock is never released and the process is stuck.
I think fixing this would require (at least) removing all output from the signal handler.