[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Build failure of janest-base on arm64...



Hi Steve,

Le 13/01/2020 à 18:48, Steve McIntyre a écrit :
>> janest-base FTBFS on arm64 (and only on this architecture) (buildd
>> arm-arm-04) with a SIGILL. However, I cannot reproduce on arm64
>> porterbox (amdahl). I've given it back, and it still fails (on arm-ubc-02).
>>
>> Does somebody have an idea on what's going on?
> 
> Hmmm. So, that's failing on two different machines, each with
> different hardware types:
> 
>  * arm-arm-04 (AMD Seattle, Cortex-A57)
>  * arm-ubc-02 (Socionext Synquacer, Cortex-A53)
> 
> but works on YA different one:
> 
>  * amdahl (APM X-Gene 1)
> 
> I've just checked on arm-arm-04 and I don't see anything in the syslog
> to give clues as to what mught have failed. Checking locally on my
> Macchiatobin:
> 
>  * mjolnir  (Marvell Armada 8040, Cortex-A72)
> 
> it also fails, which gives me a more useful way to attack this with a
> debugger.
> 
> (sid-arm64)steve@mjolnir:~/build/janest-base/janest-base-0.13.0/_build/default/compiler-stdlib/src$ gdb ../gen/gen.exe 
> ...
> (gdb) r -ocaml-where /usr/lib/ocaml -o caml.ml
> Starting program: /home/steve/build/janest-base/janest-base-0.13.0/_build/default/compiler-stdlib/gen/gen.exe -ocaml-where /usr/lib/ocaml -o caml.ml
> 
> Program received signal SIGILL, Illegal instruction.
> 0x0000aaaaab0647f8 in e843419@0031_00000203_b74 ()
> (gdb) bt
> #0  0x0000aaaaab0647f8 in e843419@0031_00000203_b74 ()
> #1  0x0000aaaaab062ff8 in camlPredef__common_initial_env_322 () at typing/predef.ml:227
> #2  0x0000aaaaab063700 in camlPredef__build_initial_env_390 () at typing/predef.ml:239
> #3  0x0000aaaaab07800c in camlEnv__entry () at typing/env.ml:2685
> #4  0x0000aaaaaafb1adc in caml_program ()
> #5  0x0000aaaaab21f644 in caml_start_program ()
> #6  0x0000aaaaab21fec0 in caml_startup_common ()
> #7  0x0000aaaaab21ff08 in caml_startup ()
> #8  0x0000aaaaaafb1220 in main ()

It seems that "e843419@0031_00000203_b74" is a stub generated by ld to
work around ARM erratum 843419, which concerns (at least) Cortex-A53. It
looks like the generated stub is invalid!

> (gdb) list
> 1       ../sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: No such file or directory.

I don't understand where this dl-procinfo.c reference comes from.

> (gdb) disassemble 
> Dump of assembler code for function e843419@0031_00000203_b74:
> => 0x0000aaaaab0647f8 <+0>:     .inst   0x00000000 ; undefined
>    0x0000aaaaab0647fc <+4>:     b       0xaaaaab063008 <camlPredef__common_initial_env_322+2112>
> End of assembler dump.

I guess the illegal instruction is ".inst 0x00000000".

I did "objdump -D" on this gen.exe binary (as generated on amdahl), and
I could not find "e843419@0031_00000203_b74". However, I found many
occurrences of "e843419@", in particular "e843419@0031_00000205_b74"
which looks very similar to the above. However, each time, the
"e843419@" stub consists of one "ldr" or "str" instruction followed by a
"b". Is it also the case on mjolnir?

> I'm not sure exactly what's going on here. I know *nothing* about
> ocaml to know what the code in predef.ml is trying to do. Line 227 is
> 
>   add_type ident_exn decl_exn (

This is a mere function call updating a data structure. It doesn't look
exotic.

> The reference to dl-procinfo.c is buried in the guts of glibc - that
> file defines the expected cpuinfo flags. *Guessing* - is something in
> ocaml trying to parse the cpuinfo flags and making a mistake? That
> *might* explain why you're getting different results here from one
> machine to the next. But that's just a guess.

Erratum 843419 doesn't affect all processors, maybe this could explain
why the issue happens only on some processors...?

I don't think ocaml itself parses cpuinfo flags, but maybe something in
libc (or ld) does to determine whether e843419 applies?

Thank you for your help... but what can be done next?

Christian Marillat says that "Binutils package is broken see #911990",
but this bug has been marked as fixed-upstream for more than 1 year, is
it really still on topic?


Cheers,

-- 
Stéphane


Reply to: