Bug#223110: marked as done (Race condition between fork() and exit() when using pthread_atfork() from a shared library)

To: Pierre HABOUZIT <madcoder@debian.org>
Subject: Bug#223110: marked as done (Race condition between fork() and exit() when using pthread_atfork() from a shared library)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Mon, 05 Feb 2007 14:49:40 -0800
Message-id: <[🔎] handler.223110.D223110.117071277915271.ackdone@bugs.debian.org>
References: <20070205215935.GA14002@hades.madism.org> <20031206213742.GA29361@doc.ic.ac.uk>

Your message dated Mon, 5 Feb 2007 22:59:35 +0100
with message-id <20070205215935.GA14002@hades.madism.org>
and subject line Race condition between fork() and exit() when using pthread_atfork() from a shared library
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---

To: submit@bugs.debian.org
Subject: Something fishy with pthread_atfork()
From: Andrew Suffield <asuffield@debian.org>
Date: Sat, 6 Dec 2003 21:37:43 +0000
Message-id: <20031206213742.GA29361@doc.ic.ac.uk>

Package: glibc

I don't know what to do with this one. I have a library that follows
the description in libc.info about how to register atfork handlers so
that mutexes behave safely across fork() (it doesn't ever fork
directly). Usually it works fine, but I have one particular case where
glibc gets stuck here:

#0  0x401d7a8a in __unregister_atfork (dso_handle=0x40221d68)
    at ../nptl/sysdeps/unix/sysv/linux/unregister-atfork.c:107

102	      /* Decrement the reference counter.  If it does not reach zero
103		 wait for the last user.  */
104	      atomic_decrement (&deleted->handler->refcntr);
105	      unsigned int val;
106	      while ((val = deleted->handler->refcntr) != 0)
107		lll_futex_wait (deleted->handler->refcntr, val);

#1  0x4012c706 in __cxa_finalize (d=0x4024a220) at cxa_finalize.c:49
#2  0x40230940 in __do_global_dtors_aux () from /usr/lib/liblookup.so.0
#3  0x40246ea6 in _fini () from /usr/lib/liblookup.so.0
#4  0x4000c1c1 in _dl_fini () at dl-fini.c:168
#5  0x4012c4a5 in *__GI_exit (status=0) at exit.c:60
#6  0x0805c4e2 in ?? ()

(gdb) p *deleted->handler
$5 = {next = 0x40223620, prepare_handler = 0x40239e6a <do_prepare_fork>, 
  parent_handler = 0x40239e90 <do_parent_fork>, child_handler = 0x40239eb6 <do_child_fork>, 
  dso_handle = 0x4024a220, refcntr = 1, need_signal = 1}

(The function fields are what I expect)

It's a race condition of some kind, so while I can duplicate it, I
can't construct a useful test case - the case where it gets stuck is
during exit of update-menus, but only when invoked as part of a dpkg
run for one particular package (the library that registers the atfork
handlers is loaded via an NSS module, so it tends to be present in
most processes).

In case I've done something stupid, here's the code that uses atfork itself:

static pthread_once_t init_once = PTHREAD_ONCE_INIT;
static pthread_mutex_t lookupd_mutex = PTHREAD_MUTEX_INITIALIZER;

...

/* Ye Overly Complicated Fork Handler, as described by the glibc
 * manual. Lock before fork, unlock in parent, reinit in child
 */

static void
do_prepare_fork(void)
{
  pthread_mutex_lock(&lookupd_mutex);
}

static void
do_parent_fork(void)
{
  pthread_mutex_unlock(&lookupd_mutex);
}

static void
do_child_fork(void)
{
  close_lookupd();
  pthread_mutex_init(&lookupd_mutex, NULL);
}

static void
do_init_threads(void)
{
  pthread_mutex_init(&lookupd_mutex, NULL);
  if (pthread_atfork(&do_prepare_fork, &do_parent_fork, &do_child_fork) != 0)
    abort();
}

static void init_threads(void) __attribute__((constructor));
static void
init_threads(void)
{
  pthread_once(&init_once, do_init_threads);
}


Some printfs indicate that at this point in time, there's nothing
interesting happening in the rest of the library, however: the prepare
handler was called once for the process which has got stuck, and the
child handler was called for a child that it forked earlier, but the
parent handler was never called.

Assuming that's true, it would explain why __unregister_atfork got
stuck (after looking at the nptl __libc_fork() implementation), but I
can't figure out how it could have happened (according to the
specification for pthread_atfork(), it's not supposed to be possible).

Even more interestingly, linuxthreads gets stuck too, in a similar
fashion; at approximately the same place, it tries to lock the mutex
that protects its fork code, and stalls.

#0  0x40256104 in __pthread_sigsuspend (set=0x4025c348)
    at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
#1  0x40255f07 in __pthread_wait_for_restart_signal (self=0x4025b540) at pthread.c:1203
#2  0x40257746 in __pthread_alt_lock (lock=0x4021c1b0, self=0x4025b540) at restart.h:34
#3  0x40254922 in *__GI___pthread_mutex_lock (mutex=0x4021c1a0) at mutex.c:123
#4  0x401d9b5b in __unregister_atfork (dso_handle=0x400c1000)
    at ../linuxthreads/sysdeps/unix/sysv/linux/unregister-atfork.c:30
#5  0x4012d272 in __cxa_finalize (d=0x400c1000) at cxa_finalize.c:49
#6  0x4005f2b0 in ?? () from /usr/lib/libstdc++.so.5
...
#14 0x4000c1c1 in _dl_fini () at dl-fini.c:168

Again, the prepare and child handlers were called, but not the parent
handler. The fact that both nptl and linuxthreads gets stuck in the
same fashion suggests that I did something wrong - but I've spent all
day screwing with it, and the parent handler is never called, and it
never segfaults (efence and valgrind come up blank).

-- 
  .''`.  ** Debian GNU/Linux ** | Andrew Suffield
 : :' :  http://www.debian.org/ |
 `. `'                          |
   `-             -><-          |

Attachment: signature.asc
Description: Digital signature

--- End Message ---

--- Begin Message ---

To: Andrew Suffield <asuffield@debian.org>, 223110-done@bugs.debian.org

Subject: Re: Race condition between fork() and exit() when using pthread_atfork() from a shared library

From: Pierre HABOUZIT <madcoder@debian.org>

Date: Mon, 5 Feb 2007 22:59:35 +0100

Message-id: <20070205215935.GA14002@hades.madism.org>

Mail-followup-to: Andrew Suffield <asuffield@debian.org>, 223110-done@bugs.debian.org

In-reply-to: <20031212232650.GA23311@suffields.me.uk>

References: <20031212232650.GA23311@suffields.me.uk>
> --8<--- foo.c ----------------
> #include <stdio.h>
> #include <unistd.h>
> #include <signal.h>
> #include <sys/types.h>
> 
> void exit_on_signal(int signr)
> {
>   fprintf(stderr, "Exiting on signal from child\n");
>   exit(0);
> }

  the problem is that it's not possible to call exit from a signal, see
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1148

  so after all this is not really a bug, such a behaviour is
unspecified.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org
Attachment: pgpiMkpyzZDIQ.pgp
Description: PGP signature

--- End Message ---

Reply to:

Prev by Date: Processed: bug 336608 is forwarded to http://sources.redhat.com/bugzilla/show_bug.cgi?id=3973
Next by Date: Bug#351629: marked as done (Recent libc6-dev broke compatibility with sarge binutils and gcc-3.3)
Previous by thread: Processed: bug 336608 is forwarded to http://sources.redhat.com/bugzilla/show_bug.cgi?id=3973
Next by thread: Bug#351629: marked as done (Recent libc6-dev broke compatibility with sarge binutils and gcc-3.3)
Index(es):
- Date
- Thread