Bug#325600: <defunct> threads.... a solution?
Daniel Jacobowitz wrote:
On Sat, Nov 12, 2005 at 07:36:21PM -0500, Tom Evans wrote:
How can this possibly be fixed by changing waitpid_not_cancel? That
call is in pthread_reap_children, which isn't even reached by this
test, as far as I can tell. Of course it should be. And of course GDB
and strace are both a bit broken on alpha, so I'm having some trouble
tracking it down.
Anyway, I'll give it another shot later.
Well, let's see if my memory recall the exact chain of events...
1) A thread exits
2) An event is sent to the thread manager thread
3) The thread manager processes the end of thread signal
4) The thread manager eventually calls pthread_reap_children in an
attempt to, um, reap the children, free the resources, etc.
5) waitpid_not_cancel is called within pthread_reap_children to find the
exited children, which calls some variation of the wait4 system call.
From my understanding, threads do not exit until after a successful
"wait", but the "wait" called used in waitpid_not_cancel busted, so it
never returns any recently exited thread ids.
6) strace must be used with -f to see the appropriate trace - that's how
I found the culprit eventually.
Also, be sure you are using the unstable or testing libc and libpthread,
otherwise you won't see the defunct entries -
set LD_LIBRARY_PATH to point to the unstable version if the Debian
machine you are using has stable installed.