retitle 223110 Race condition between fork() and exit() when using pthread_atfork() from a shared library severity 223110 important tags 223110 upstream thanks Figured it out (this mail is self-contained, reading the earlier bug report is not necessary). First, here's a test case that breaks for me: --8<--- foo.c ---------------- #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/types.h> void exit_on_signal(int signr) { fprintf(stderr, "Exiting on signal from child\n"); exit(0); } extern void foo(void); int main(void) { foo(); signal(SIGUSR2,exit_on_signal); pid_t parent = getpid(); if (fork() == 0) kill(parent, SIGUSR2); return 0; } ------------------------------ --8<--- libfoo.c ------------- #include <pthread.h> void do_prepare(void) { } void do_child(void) { } void foo(void) { pthread_atfork(&do_prepare, NULL, &do_child); } ------------------------------ Compile libfoo.c into a .so, compile foo.c into an executable that links to it. Here's the interesting part of what happens when it runs: Enter main() -> Enter foo() -> pthread_atfork() registers the handlers (it doesn't matter which ones are present; I think three NULLs will still break), and associates them with libfoo.so. refcntr on this handler is initialised to 1 -> fork() -> Enter __libc_fork() (in nptl/sysdeps/unix/sysv/linux/fork.c) -> Call do_prepare() -> Increment refcntr on the atfork handler (refcntr == 2) -> Invoke the fork syscall child -> Call do_child() -> Decrement refcntr on the atfork handler (refcntr == 1) -> Send signal SIGUSR2 to the parent -> Exit parent -> Enter exit_on_signal() -> Enter exit() ... -> Unload libfoo -> Call __unregister_atfork() for libfoo (in nptl/sysdeps/unix/sysv/linux/unregister-atfork.c) -> Decrement refcntr on the atfork handler (refcntr == 1) -> Wait for refcntr to reach zero This condition will never be true. __libc_fork() incremented refcntr on the atfork handler, but will never decrement it because in order for that to happen, the signal handler would have to return, which would require exit() to return. __unregister_atfork() will hang waiting for this variable to reach zero. Note that the parent never woke up from the fork syscall until after the child had sent the signal. This is a race condition; the child must send the signal almost right away. In threading-speak, __libc_fork() has acquired a lock without registering a suitable cancellation handler, and was then cancelled by a signal handler. I have no idea how to fix this. <punt> [This happened on an Athlon (Barton) 2500+, using linux 2.6.0-test11 with CONFIG_PREEMPT disabled] -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | `. `' | `- -><- |
Attachment:
signature.asc
Description: Digital signature