[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Strange failures on 4.9.6-3 kernel




On 09/02/2017 21:22, James Clarke wrote:
> On 9 Feb 2017, at 23:08, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>> On 09/02/2017 20:14, James Clarke wrote:
>>>> On 9 Feb 2017, at 21:31, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> While testing glibc on the kindly provided T5 machine from Debian environment,
>>>> I started to see some strange issues on sparc64 where glibc is failing on 
>>>> mostly static tests. 
>>>>
>>>> Funny thing is I checked the latest working revision I used to update 2.25 
>>>> release page [1] and now the tests that used to pass are now failing. In 
>>>> fact I checked even the 2.23 and 2.24 glibc releases and both show the same
>>>> issues as master branch, so I am almost ruling out a glibc regression (which 
>>>> was my first idea).
>>>>
>>>> I noted that the machine kernel was updated (from 4.9.2-2 to 4.9.6-3), but 
>>>> I am not sure if this is something to kernel.  I haven't recorded the
>>>> gcc revision I used on my initial testings.  The static tets are failing due
>>>> a memcpy call that issues bogus instructions:
>>>>
>>>> (gdb) r
>>>> Starting program: /home/azanella/glibc/glibc-git-build/elf/tst-tls1-static 
>>>>
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> 0x0000000000000340 in ?? ()
>>>> (gdb) bt
>>>> #0  0x0000000000000340 in ?? ()
>>>> #1  0x0000000000101fd8 in __libc_setup_tls () at libc-tls.c:180
>>>> #2  0x0000000000101950 in __libc_start_main (main=0x4e8, argc=<optimized out>, argv=0x7feffffef78, init=0x4a8, fini=0x220, rtld_fini=0x0, stack_end=0x1)
>>>>   at libc-start.c:189
>>>> #3  0x0000000000100704 in _start () at ../sysdeps/sparc/sparc64/start.S:88
>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>>
>>>> (gdb) up
>>>> [...]
>>>>  0x0000000000101fc8 <+344>:   add  %l4, %o0, %o0
>>>>  0x0000000000101fcc <+348>:   mov  %i1, %o1
>>>>  0x0000000000101fd0 <+352>:   call  0x2949c0
>>>>  0x0000000000101fd4 <+356>:   stx  %o0, [ %i4 + 0x20 ]
>>>> => 0x0000000000101fd8 <+360>:   sethi  %hi(0x4800), %g3
>>>>
>>>> It seems 0x2949c0 is a unknown address, where it should be the memcpy one. 
>>>
>>> Do you have the .o still for this? I would be interested to see what the
>>> relocation was. One thing that has changed within the last week is enabling
>>> PIE by default in GCC, though this call is a plain PC-relative one.
>>>
>>> Regards,
>>> James
>>>
>>
>> Yes, objdump shows:
>>
>> $ objdump -r string/memcpy.o
>> string/memcpy.o:     file format elf64-sparc
>>
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET           TYPE              VALUE 
>> 0000000000000010 R_SPARC_GOT22     __memcpy_niagara4
>> 0000000000000014 R_SPARC_GOT10     __memcpy_niagara4
>> 0000000000000028 R_SPARC_GOT22     __memcpy_niagara2
>> 000000000000002c R_SPARC_GOT10     __memcpy_niagara2
>> 0000000000000040 R_SPARC_GOT22     __memcpy_niagara1
>> 0000000000000044 R_SPARC_GOT10     __memcpy_niagara1
>> 0000000000000058 R_SPARC_GOT22     __memcpy_ultra3
>> 000000000000005c R_SPARC_GOT10     __memcpy_ultra3
>> 0000000000000068 R_SPARC_GOT22     __memcpy_ultra1
>> 000000000000006c R_SPARC_GOT10     __memcpy_ultra1
>> 0000000000000088 R_SPARC_GOT22     __mempcpy_niagara4
>> 000000000000008c R_SPARC_GOT10     __mempcpy_niagara4
>> 00000000000000a0 R_SPARC_GOT22     __mempcpy_niagara2
>> 00000000000000a4 R_SPARC_GOT10     __mempcpy_niagara2
>> 00000000000000b8 R_SPARC_GOT22     __mempcpy_niagara1
>> 00000000000000bc R_SPARC_GOT10     __mempcpy_niagara1
>> 00000000000000d0 R_SPARC_GOT22     __mempcpy_ultra3
>> 00000000000000d4 R_SPARC_GOT10     __mempcpy_ultra3
>> 00000000000000e0 R_SPARC_GOT22     __mempcpy_ultra1
>> 00000000000000e4 R_SPARC_GOT10     __mempcpy_ultra1
>>
>> [debug relocations...]
>>
>> Which is expected to use GOT relocations for PIE.  And if I build the 
>> same object with -fno-pie I do see:
>>
>> string/memcpy.o:     file format elf64-sparc
>>
>> RELOCATION RECORDS FOR [.text]:
>> OFFSET           TYPE              VALUE
>> 0000000000000010 R_SPARC_HI22      __memcpy_niagara4
>> 0000000000000014 R_SPARC_LO10      __memcpy_niagara4
>> 0000000000000028 R_SPARC_HI22      __memcpy_niagara2
>> 000000000000002c R_SPARC_LO10      __memcpy_niagara2
>> 0000000000000040 R_SPARC_HI22      __memcpy_niagara1
>> 0000000000000044 R_SPARC_LO10      __memcpy_niagara1
>> 0000000000000058 R_SPARC_HI22      __memcpy_ultra3
>> 000000000000005c R_SPARC_LO10      __memcpy_ultra3
>> 0000000000000068 R_SPARC_HI22      __memcpy_ultra1
>> 000000000000006c R_SPARC_LO10      __memcpy_ultra1
>> 0000000000000088 R_SPARC_HI22      __mempcpy_niagara4
>> 000000000000008c R_SPARC_LO10      __mempcpy_niagara4
>> 00000000000000a0 R_SPARC_HI22      __mempcpy_niagara2
>> 00000000000000a4 R_SPARC_LO10      __mempcpy_niagara2
>> 00000000000000b8 R_SPARC_HI22      __mempcpy_niagara1
>> 00000000000000bc R_SPARC_LO10      __mempcpy_niagara1
>> 00000000000000d0 R_SPARC_HI22      __mempcpy_ultra3
>> 00000000000000d4 R_SPARC_LO10      __mempcpy_ultra3
>> 00000000000000e0 R_SPARC_HI22      __mempcpy_ultra1
>> 00000000000000e4 R_SPARC_LO10      __mempcpy_ultra1
>>
>> I think no one rally tried to build the glibc with a default pie gcc so it
>> might the side-effects of it.  I tried to build with CC='gcc -fno-pie', but
>> it failed on sunrpc/cross-rpcgen again with a segfault due a bogus jump
>> from a possible mis-relocation. 
>>
>> I am rebuilding gcc 6 without default pie to check if I can rebuilt and
>> run glibc correctly.
> 
> I meant libc-tls.o's supposed call to memcpy in __libc_setup_tls.
> 
> Regards,
> James
> 

Ah right, there it is:

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000004 R_SPARC_HI22      _dl_phdr
0000000000000008 R_SPARC_LO10      _dl_phdr
0000000000000010 R_SPARC_HI22      _dl_phnum
0000000000000014 R_SPARC_LO10      _dl_phnum
0000000000000068 R_SPARC_HI22      _dl_tls_static_size
0000000000000070 R_SPARC_LO10      _dl_tls_static_size
00000000000000a4 R_SPARC_HI22      _dl_tls_static_size
00000000000000ac R_SPARC_LO10      _dl_tls_static_size
00000000000000c8 R_SPARC_WDISP30   __sbrk
00000000000000d8 R_SPARC_HI22      _dl_static_dtv
00000000000000dc R_SPARC_LO10      _dl_static_dtv
00000000000000e0 R_SPARC_HI22      _dl_ns
00000000000000e8 R_SPARC_LO10      _dl_ns
0000000000000108 R_SPARC_LO10      _dl_static_dtv
0000000000000110 R_SPARC_OLO10     _dl_static_dtv+0x0000000000000028
0000000000000124 R_SPARC_OLO10     _dl_static_dtv+0x0000000000000020
0000000000000128 R_SPARC_WDISP30   memcpy
0000000000000134 R_SPARC_LO10      _dl_tls_static_size
000000000000014c R_SPARC_LO10      _dl_tls_static_size
0000000000000150 R_SPARC_HI22      _dl_tls_static_used
0000000000000154 R_SPARC_HI22      .bss
000000000000015c R_SPARC_LO10      .bss
0000000000000164 R_SPARC_HI22      _dl_tls_max_dtv_idx
0000000000000168 R_SPARC_LO10      _dl_tls_static_used
000000000000016c R_SPARC_HI22      _dl_tls_static_align
0000000000000174 R_SPARC_LO10      .bss
0000000000000178 R_SPARC_LO10      _dl_tls_max_dtv_idx
000000000000017c R_SPARC_LO10      _dl_tls_static_align
0000000000000180 R_SPARC_HI22      _dl_tls_dtv_slotinfo_list
0000000000000184 R_SPARC_HI22      _dl_tls_static_nelem
000000000000019c R_SPARC_LO10      _dl_tls_dtv_slotinfo_list
00000000000001a0 R_SPARC_OLO10     .bss+0x0000000000000028
00000000000001a4 R_SPARC_LO10      _dl_tls_static_nelem

It seems to be the same with a GCC 6 without default pie.  Also, it does seems
that the default pie is the only conflicting with glibc build.  I will check
on ohter architectures who glibc behaves with a gcc with this option.


Reply to: