[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#223110: Race condition between fork() and exit() when using pthread_atfork() from a shared library



retitle 223110 Race condition between fork() and exit() when using pthread_atfork() from a shared library
severity 223110 important
tags 223110 upstream
thanks

Figured it out (this mail is self-contained, reading the earlier bug
report is not necessary). First, here's a test case that breaks for me:

--8<--- foo.c ----------------
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>

void exit_on_signal(int signr)
{
  fprintf(stderr, "Exiting on signal from child\n");
  exit(0);
}

extern void foo(void);

int main(void)
{
  foo();
  signal(SIGUSR2,exit_on_signal);
  pid_t parent = getpid();
  if (fork() == 0)
    kill(parent, SIGUSR2);
  return 0;
}
------------------------------

--8<--- libfoo.c -------------
#include <pthread.h>

void
do_prepare(void)
{
}

void
do_child(void)
{
}

void
foo(void)
{
  pthread_atfork(&do_prepare, NULL, &do_child);
}
------------------------------

Compile libfoo.c into a .so, compile foo.c into an executable that
links to it.

Here's the interesting part of what happens when it runs:

Enter main()
 -> Enter foo()
     -> pthread_atfork() registers the handlers (it doesn't matter
        which ones are present; I think three NULLs will still break),
        and associates them with libfoo.so. refcntr on this handler is
        initialised to 1
 -> fork()
     -> Enter __libc_fork() (in nptl/sysdeps/unix/sysv/linux/fork.c)
         -> Call do_prepare()
         -> Increment refcntr on the atfork handler (refcntr == 2)
         -> Invoke the fork syscall
       child -> Call do_child()
             -> Decrement refcntr on the atfork handler (refcntr == 1)
 -> Send signal SIGUSR2 to the parent
 -> Exit
parent -> Enter exit_on_signal()
           -> Enter exit()
               ...
               -> Unload libfoo
                   -> Call __unregister_atfork() for libfoo (in nptl/sysdeps/unix/sysv/linux/unregister-atfork.c)
                       -> Decrement refcntr on the atfork handler (refcntr == 1)
                       -> Wait for refcntr to reach zero

This condition will never be true. __libc_fork() incremented refcntr
on the atfork handler, but will never decrement it because in order
for that to happen, the signal handler would have to return, which
would require exit() to return. __unregister_atfork() will hang
waiting for this variable to reach zero.

Note that the parent never woke up from the fork syscall until after
the child had sent the signal. This is a race condition; the child
must send the signal almost right away.

In threading-speak, __libc_fork() has acquired a lock without
registering a suitable cancellation handler, and was then cancelled by
a signal handler.

I have no idea how to fix this. <punt>

[This happened on an Athlon (Barton) 2500+, using linux 2.6.0-test11
with CONFIG_PREEMPT disabled]

-- 
  .''`.  ** Debian GNU/Linux ** | Andrew Suffield
 : :' :  http://www.debian.org/ |
 `. `'                          |
   `-             -><-          |

Attachment: signature.asc
Description: Digital signature


Reply to: