Bug#465652: marked as done (libc6: Occasional failed wakeup in pthread_cond_wait)

To: "Adam Olsen" <rhamph@gmail.com>
Subject: Bug#465652: marked as done (libc6: Occasional failed wakeup in pthread_cond_wait)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Tue, 19 Feb 2008 18:21:06 +0000
Message-id: <handler.465652.D465652.120344509230109.ackdone@bugs.debian.org>
References: <aac2c7cb0802191011m42f74984la4d9ec1fab26008b@mail.gmail.com> <aac2c7cb0802131053g40593c05ga4c1819cf9c72f77@mail.gmail.com>

Your message dated Tue, 19 Feb 2008 11:11:57 -0700
with message-id <aac2c7cb0802191011m42f74984la4d9ec1fab26008b@mail.gmail.com>
and subject line Re: Bug#465652: Info received (Bug#465652: Acknowledgement (libc6: Occasional failed wakeup in pthread_cond_wait))
has caused the Debian Bug report #465652,
regarding libc6: Occasional failed wakeup in pthread_cond_wait
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
465652: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=465652
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: "Debian Bug Tracking System" <submit@bugs.debian.org>
Subject: libc6: Occasional failed wakeup in pthread_cond_wait
From: "Adam Olsen" <rhamph@gmail.com>
Date: Wed, 13 Feb 2008 11:53:32 -0700
Message-id: <aac2c7cb0802131053g40593c05ga4c1819cf9c72f77@mail.gmail.com>

Package: libc6
Version: 2.7-5
Severity: normal


In my multithreaded application I'm finding calls to pthread_cond_wait
are occasionally not woken by pthread_cond_broadcast.

Some possibly relevent factors:
* This is a single CPU, single core box
* There's typically 1-3 threads calling pthread_cond_wait
* There's a single global cond used, but each thread has their own lock
* A maintenance thread (although the roles change) acquires all of the
  threads' locks, ensuring they're all asleep.  It then calls
  pthread_cond_broadcast, followed by releasing all their locks
* The maintanance thread does this repeatedly, successfully waking up
  other threads from the cond, as well as repeatedly acquiring and
  releasing the hung thread's lock
* I've verified with my own logging and strace that the maintenance
  thread is acquiring the same lock passed to pthread_cond_wait by the
  hung thread
* A snippet from strace's log (full size 43 megs):
  http://pastebin.com/f41c0c791
* I've verified with gdb that the hung thread is in "#0  0xb7edf820 in
  pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0"
* Attaching and detaching gdb causes the hung thread to wakeup and
  finish normally.

I was told of a patch on IRC, but I was later told it did not affect
x86 (which I'm using).  For posterity, here's what I had written:
Additionally, I was told on IRC of a patch set to glibc's locking code
that came out after 2.7.  I haven't verified if these would fix it, or
if they're even related, but it's something to consider.
http://sources.redhat.com/bugzilla/show_bug.cgi?id=5240
Three changed files are linked there.  I was given a 4th on IRC, which
I'm told is a correction.
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/lowlevellock.c.diff?cvsroot=glibc&r1=1.18&r2=1.19


-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.22-2-k7 (SMP w/1 CPU core)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                       1:4.2.2-1  GCC support library

libc6 recommends no packages.

-- debconf information excluded


-- 
Adam Olsen, aka Rhamphoryncus

--- End Message ---

--- Begin Message ---

To: 465652-done@bugs.debian.org

Subject: Re: Bug#465652: Info received (Bug#465652: Acknowledgement (libc6: Occasional failed wakeup in pthread_cond_wait))

From: "Adam Olsen" <rhamph@gmail.com>

Date: Tue, 19 Feb 2008 11:11:57 -0700

Message-id: <aac2c7cb0802191011m42f74984la4d9ec1fab26008b@mail.gmail.com>

In-reply-to: <handler.465652.B465652.120336219127169.ackinfo@bugs.debian.org>

References: <aac2c7cb0802181116w161ab5b5ia018286f5326ca1a@mail.gmail.com> <handler.465652.B465652.120336219127169.ackinfo@bugs.debian.org>
Sorry folks, user error. :(  I finally noticed the relevant paragraph in SUSv2:

"The effect of using more than one mutex for concurrent
pthread_cond_wait() or pthread_cond_timedwait() operations on the same
condition variable is undefined; that is, a condition variable becomes
bound to a unique mutex when a thread waits on the condition variable,
and this (dynamic) binding ends when the wait returns."

In my defence, the man page is ambiguous.  I interpreted it as only
explaining how to use pthread_cond_wait, not as a property of the
condition itself (otherwise why wouldn't pthread_cond_init take a
mutex argument?):

       A condition variable must always be associated with a mutex,  to  avoid
       the race condition where a thread prepares to wait on a condition vari‐
       able and another thread signals the condition  just  before  the  first
       thread actually waits on it.


-- 
Adam Olsen, aka Rhamphoryncus
--- End Message ---

Reply to:

References:
- Bug#465652: libc6: Occasional failed wakeup in pthread_cond_wait
  - From: "Adam Olsen" <rhamph@gmail.com>

Prev by Date: Bug#466519: close: it seems not related
Next by Date: Bug#465346: Found the complete pthread_ library calls in package 'manpages-posix-dev'
Previous by thread: Bug#465652: Acknowledgement (libc6: Occasional failed wakeup in pthread_cond_wait)
Next by thread: extern inline and ?stat64 fun in glibc
Index(es):
- Date
- Thread