[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#755397: glibc FTBFS on alpha: tst-eintr3 sometimes fails.



Source: glibc
Version: 2.19-7
Severity: important
User: debian-alpha@lists.debian.org
Usertags: alpha
Justification: Fails to build from source but built in the past.

The test tst-eintr3 sometimes fails in the build of glibc on alpha
and has done so twice in a row in attempting to build 2.19-7.

It's an intermittant fault that appears to only occur on a
multiprocessor SMP system (which the buildd imago is).  Running the
test manually 40 or so times never failed when running a UP kernel.

To make testing faster I have used upstream glibc source on the 2.19
branch configuring with --enable-hardcoded-path-in-tests and running
tst-eintr3 with the --direct option.  It occasionally segfaults.
Getting a core dump and analysing with gdb gives the following:

Core was generated by `/home/mjc/toolchain/glibc-build/nptl/tst-eintr3 --direct'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  start_thread (arg=0x2000121f1f0) at pthread_create.c:243
243	  __resp = &pd->res;

(gdb) bt full
#0  start_thread (arg=0x2000121f1f0) at pthread_create.c:243
        pd = 0x2000121f1f0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0 <repeats 17 times>}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x2000003da00 <start_thread>, 
              0x2000121f1f0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 252416}}}
        not_first_call = <optimized out>
        robust = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#1  0x0000020000177d24 in thread_start ()
    at ../ports/sysdeps/unix/sysv/linux/alpha/clone.S:111
No locals.

(gdb) disass /m

Dump of assembler code for function start_thread:
232	{
   0x000002000003da00 <+0>:	ldah	gp,3(t12)
   0x000002000003da04 <+4>:	lda	gp,-14800(gp)
   0x000002000003da08 <+8>:	lda	sp,-240(sp)
   0x000002000003da14 <+20>:	stq	fp,40(sp)
   0x000002000003da18 <+24>:	mov	sp,fp
   0x000002000003da24 <+36>:	stq	s0,8(sp)
   0x000002000003da28 <+40>:	stq	ra,0(sp)
   0x000002000003da30 <+48>:	stq	s1,16(sp)
   0x000002000003da38 <+56>:	stq	s2,24(sp)
   0x000002000003da3c <+60>:	stq	s3,32(sp)
   0x000002000003da40 <+64>:	stq	a0,224(fp)

233	  struct pthread *pd = (struct pthread *) arg;
234	
235	#if HP_TIMING_AVAIL
236	  /* Remember the time when the thread was started.  */
237	  hp_timing_t now;
238	  HP_TIMING_NOW (now);
239	  THREAD_SETMEM (pd, cpuclock_offset, now);
240	#endif
241	
242	  /* Initialize resolver state pointer.  */
243	  __resp = &pd->res;
   0x000002000003da0c <+12>:	rduniq
   0x000002000003da10 <+16>:	ldq	t0,-32656(gp)
   0x000002000003da20 <+32>:	addq	v0,t0,t0
   0x000002000003da2c <+44>:	lda	t1,1208(a0)
   0x000002000003da34 <+52>:	mov	v0,s0
=> 0x000002000003da44 <+68>:	stq	t1,0(t0)


The __resp variable appears to be a thread local variable being
accessed (well, written) by the initial exec TLS model.  The rduniq
PALcall should put the thread pointer (from the PCB) into register
v0.  Now let's check the address being written to at the point of
the segfault.

(gdb) print /x $t0
$1 = 0x18

That's definitely not a valid memory location since the first page of
memory starting at location 0 should be inaccessible.  Checking the
thread pointer:

(gdb) print /x $v0
$2 = 0x0

Ouch!  That looks like the thread pointer in the PCB has not been
initialised.

Running tst-eintr3 under gdb and setting a break point on line 243
reveals that, in general, the rduniq PALcall does return a valid
memory address (and presumably correctly the thread pointer), but,
occassionaly on an SMP system, it can return 0.  

This is as far as I have got with debugging.  Presumably there is a
wruniq PALcode call somewhere that sets up the thread pointer in the
PCB and that might be the next place to investigate what is going
on.

Cheers
Michael.


Reply to: