Your message dated Mon, 5 Feb 2007 22:59:35 +0100 with message-id <20070205215935.GA14002@hades.madism.org> and subject line Race condition between fork() and exit() when using pthread_atfork() from a shared library has caused the attached Bug report to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what I am talking about this indicates a serious mail system misconfiguration somewhere. Please contact me immediately.) Debian bug tracking system administrator (administrator, Debian Bugs database)
--- Begin Message ---
- To: submit@bugs.debian.org
- Subject: Something fishy with pthread_atfork()
- From: Andrew Suffield <asuffield@debian.org>
- Date: Sat, 6 Dec 2003 21:37:43 +0000
- Message-id: <20031206213742.GA29361@doc.ic.ac.uk>
Package: glibc I don't know what to do with this one. I have a library that follows the description in libc.info about how to register atfork handlers so that mutexes behave safely across fork() (it doesn't ever fork directly). Usually it works fine, but I have one particular case where glibc gets stuck here: #0 0x401d7a8a in __unregister_atfork (dso_handle=0x40221d68) at ../nptl/sysdeps/unix/sysv/linux/unregister-atfork.c:107 102 /* Decrement the reference counter. If it does not reach zero 103 wait for the last user. */ 104 atomic_decrement (&deleted->handler->refcntr); 105 unsigned int val; 106 while ((val = deleted->handler->refcntr) != 0) 107 lll_futex_wait (deleted->handler->refcntr, val); #1 0x4012c706 in __cxa_finalize (d=0x4024a220) at cxa_finalize.c:49 #2 0x40230940 in __do_global_dtors_aux () from /usr/lib/liblookup.so.0 #3 0x40246ea6 in _fini () from /usr/lib/liblookup.so.0 #4 0x4000c1c1 in _dl_fini () at dl-fini.c:168 #5 0x4012c4a5 in *__GI_exit (status=0) at exit.c:60 #6 0x0805c4e2 in ?? () (gdb) p *deleted->handler $5 = {next = 0x40223620, prepare_handler = 0x40239e6a <do_prepare_fork>, parent_handler = 0x40239e90 <do_parent_fork>, child_handler = 0x40239eb6 <do_child_fork>, dso_handle = 0x4024a220, refcntr = 1, need_signal = 1} (The function fields are what I expect) It's a race condition of some kind, so while I can duplicate it, I can't construct a useful test case - the case where it gets stuck is during exit of update-menus, but only when invoked as part of a dpkg run for one particular package (the library that registers the atfork handlers is loaded via an NSS module, so it tends to be present in most processes). In case I've done something stupid, here's the code that uses atfork itself: static pthread_once_t init_once = PTHREAD_ONCE_INIT; static pthread_mutex_t lookupd_mutex = PTHREAD_MUTEX_INITIALIZER; ... /* Ye Overly Complicated Fork Handler, as described by the glibc * manual. Lock before fork, unlock in parent, reinit in child */ static void do_prepare_fork(void) { pthread_mutex_lock(&lookupd_mutex); } static void do_parent_fork(void) { pthread_mutex_unlock(&lookupd_mutex); } static void do_child_fork(void) { close_lookupd(); pthread_mutex_init(&lookupd_mutex, NULL); } static void do_init_threads(void) { pthread_mutex_init(&lookupd_mutex, NULL); if (pthread_atfork(&do_prepare_fork, &do_parent_fork, &do_child_fork) != 0) abort(); } static void init_threads(void) __attribute__((constructor)); static void init_threads(void) { pthread_once(&init_once, do_init_threads); } Some printfs indicate that at this point in time, there's nothing interesting happening in the rest of the library, however: the prepare handler was called once for the process which has got stuck, and the child handler was called for a child that it forked earlier, but the parent handler was never called. Assuming that's true, it would explain why __unregister_atfork got stuck (after looking at the nptl __libc_fork() implementation), but I can't figure out how it could have happened (according to the specification for pthread_atfork(), it's not supposed to be possible). Even more interestingly, linuxthreads gets stuck too, in a similar fashion; at approximately the same place, it tries to lock the mutex that protects its fork code, and stalls. #0 0x40256104 in __pthread_sigsuspend (set=0x4025c348) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 #1 0x40255f07 in __pthread_wait_for_restart_signal (self=0x4025b540) at pthread.c:1203 #2 0x40257746 in __pthread_alt_lock (lock=0x4021c1b0, self=0x4025b540) at restart.h:34 #3 0x40254922 in *__GI___pthread_mutex_lock (mutex=0x4021c1a0) at mutex.c:123 #4 0x401d9b5b in __unregister_atfork (dso_handle=0x400c1000) at ../linuxthreads/sysdeps/unix/sysv/linux/unregister-atfork.c:30 #5 0x4012d272 in __cxa_finalize (d=0x400c1000) at cxa_finalize.c:49 #6 0x4005f2b0 in ?? () from /usr/lib/libstdc++.so.5 ... #14 0x4000c1c1 in _dl_fini () at dl-fini.c:168 Again, the prepare and child handlers were called, but not the parent handler. The fact that both nptl and linuxthreads gets stuck in the same fashion suggests that I did something wrong - but I've spent all day screwing with it, and the parent handler is never called, and it never segfaults (efence and valgrind come up blank). -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | `. `' | `- -><- |Attachment: signature.asc
Description: Digital signature
--- End Message ---
--- Begin Message ---
- To: Andrew Suffield <asuffield@debian.org>, 223110-done@bugs.debian.org
- Subject: Re: Race condition between fork() and exit() when using pthread_atfork() from a shared library
- From: Pierre HABOUZIT <madcoder@debian.org>
- Date: Mon, 5 Feb 2007 22:59:35 +0100
- Message-id: <20070205215935.GA14002@hades.madism.org>
- Mail-followup-to: Andrew Suffield <asuffield@debian.org>, 223110-done@bugs.debian.org
- In-reply-to: <20031212232650.GA23311@suffields.me.uk>
- References: <20031212232650.GA23311@suffields.me.uk>
> --8<--- foo.c ---------------- > #include <stdio.h> > #include <unistd.h> > #include <signal.h> > #include <sys/types.h> > > void exit_on_signal(int signr) > { > fprintf(stderr, "Exiting on signal from child\n"); > exit(0); > } the problem is that it's not possible to call exit from a signal, see http://sources.redhat.com/bugzilla/show_bug.cgi?id=1148 so after all this is not really a bug, such a behaviour is unspecified. -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.orgAttachment: pgpiMkpyzZDIQ.pgp
Description: PGP signature
--- End Message ---