[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

about test_socket hang


I found something about test_socket hang.  The problem I'm seeing
looks like a race condition.  kdump -H yields:

 73334 100505 python   CALL  thr_kill(0x1884e,SIG(null))
 73334 100505 python   RET   thr_kill 0
 73334 100505 python   CALL  thr_exit(0xccab78)
 73334 100430 python   CALL  thr_kill(0x187f9,SIG(null))
 73334 100430 python   RET   thr_kill 0
 73334 100430 python   CALL  poll(0xc6eee0,0x1,0x7d0)

As you can see in this log, thread 100505 sends restart signal to
thread 100430 *BEFORE* thread 100430 has started poll() kernel call.
Then thread 100430 is stuck in poll() with nobody to restart it.

If I hit ^Z, poll is interrupted, then "fg" makes it progress.  By
doing this a couple of times I managed to make test_socket finish.
Alternatively, I patched glibc with this workaround (see attachment).
This makes the test finish in about 16 seconds.

Going back to the race between thr_kill() and poll(), any idea what
could be causing this?  To begin with I don't even know how to get a
backtrace for the relevant calls :-/

Robert Millan

Attachment: workaround.diff
Description: Binary data

Reply to: