[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#755397: tst-eintr3 test suite failure on alpha might be due to a kernel bug



I am starting to think this tst-eintr3 test suite failure on Alpha
might is a kernel bug.  My reasoning goes as follows.

Nowhere does glibc call the wruniq PALcall thus it is not glibc
setting up the thread pointer for a process.  The thread pointer
is passed as an argument to the clone() syscall and it is the
kernel that initialises the process control block (PCB) for a
process and ensures that it is switched in when a process is
scheduled.  That raises the question of whether it might be the
kernel that is failing to correctly initialise the thread pointer
in the PCB.

I therefore ran tst-eintr3 under strace to check that glibc is
calling the clone() syscall correctly.

When tst-eintr3 works correctly the syscall trace is (after deleting
quite a bit of irrelevancy at the start and at the end)

clone(child_stack=0x20000a1eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x20000a1f2c0, tls=0x20000a1f8e0, child_tidptr=0x20000a1f2c0) = 20119
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=20118, si_uid=1000} ---
write(1, ".", 1.)                        = 1
sigreturn() (mask [])                   = 20119
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x20000a20000
mprotect(0x20000a20000, 8192, PROT_NONE) = 0
clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0x2000121f8e0, child_tidptr=0x2000121f2c0) = 20120


In total, clone() gets called twice, and in both cases it looks like
glibc has passed sensible arguments to clone().


Now, the syscall trace for the case when tst-eintr3 segfaults is
illuminating:

clone(child_stack=0x20000a1eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x20000a1f2c0, tls=0x20000a1f8e0, child_tidptr=0x20000a1f2c0) = 20087
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=20086, si_uid=1000} ---
write(1, ".", 1.)                        = 1
sigreturn() (mask [])                   = 20087
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x20000a20000
mprotect(0x20000a20000, 8192, PROT_NONE) = 0
clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0x2000121f8e0, child_tidptr=0x2000121f2c0) = ? ERESTARTNOINTR (To be restarted)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=20086, si_uid=1000} ---
write(1, ".", 1.)                        = 1
sigreturn() (mask [])                   = -1 ERRNO_312 (Unknown error 312)
clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0, child_tidptr=0x2000121f2c0) = 20089
+++ killed by SIGSEGV +++


The clone() syscall has been tried three times because on the second
time it failed with ERESTARTNOINTR and that particular call has been
retried (the third call to clone() above) with the same arguments except
that the tls argument is now zero!  That error, ERESTARTNOINTR, is a
kernel internal error and should not be visible to userspace.  Indeed,
it is the kernel that retries the clone() syscall as the first try at
running clone() ended up in a mess (something about receiving a signal
right at the worst moment of cloning the process) and the way to
recover is to abandon the clone() function and retry it from the start.

But why is the tls argument zeroed on the retry?  The alpha version of
clone() does not pass the tls argument in the normal way (as a
register). Instead the architecture specific code resorts to finding it
on the stack where the CPU registers were saved on entry to the kernel.
But when an error is returned from a syscall the kernel writes the
stack location for the a3 cpu register with the errno return so that on
final exit from the kernel the a3 cpu register will contain the errno
and then, if appropriate, retries the syscall ensuring the a3 cpu
register has its original contents that it had on entry to the kernel.
But that is no good for the clone() syscall because it ignores the
a3 cpu register (which is the tls argument) and goes to the saved 
register on the stack which is by now been changed to zero.  Voila,
a segmentation fault then results in userspace as the thread pointer
is invalid.

I'll take this to linux-alpha and the lkml for thoughts on my analysis
and to work out a solution.

Cheers
Michael.


Reply to: