[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: armhf: abel.d.o hardware status ?



On Wed, Jun 29, 2022 at 5:34 PM Wookey <wookey@wookware.org> wrote:
>
> On 2022-06-29 15:13 +0200, Mathieu Malaterre wrote:
> > On Wed, Jun 29, 2022 at 2:48 PM Wookey <wookey@wookware.org> wrote:
>
> > > What exactly is going wrong when you try to use valgrind?
> >
> > Well you should see something like this on abel.d.o:
> >
> > * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=928224#27
> >
> > Basically anytime you build valgrind using gcc-11 or gcc-12 (debian
> > sid package), you get this weird illegal instruction:
> >
> > ```
> > % ./vg-in-place
> > Illegal instruction
> > ```
>
> I have a strong suspicion that this is neon-itis. The issue generally
> manifests as 'illegal instuction' (i.e a neon instruction is issued on
> hardware that isn't able to execute it). It has always been the case
> that software should not assume neon is present on v7 (because it
> isn't on all hardware), and most code gets this right, but I've
> recently seen gcc putting those instuctions into the startup code
> (where the C-environment is set up and variables allocated) which gets
> executed _before_ any functions checking for which HWCAPS to enable,
> and thus which code to run.
>
> You can check if a binary contains NEON instructions using
> readelf -A
>
> and look for
> Tag_Advanced_SIMD_arch: NEONv1
>
> However just because its in the binary doesn't mean it's wrong. The
> binary may have been built using ifunc or other mechanisms to choose
> appropriate functions depending whether or not neon hardware is available.
>
> A simple check for whether this is your issue is just to run the same test on harris.debian.org.
> If it works OK there that strongly suggests you have a neon problem.
>
> Also if you run the program under gdb (on abel) and when it barfs do:
> (gdb) disassemble
> and look for instructions that start with 'v', like 'vmov.i32'
> that will confirm which instruction is tripping it up.
>
> This bug has an example of the problem:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998043
>
> I got partway thorugh a long followup with some details of possible
> fixes some months ago but got sidetracked (and oh look it's been
> pending for 6 months already).
>
> The reason this has broken appears to be that gcc has changed the way
> the fpu is specified/defaulted, so neon _and_ fp are enabled by
> default if no specific fpu option is given. (i.e we just set
> -march=armv7). It used to be that -march=armv7 implied +nosimd.  (or
> something like that - I never quite got to the bottom of it enough to
> be sure eactly what the right general or specific fix was).
>
> If you rebuild with
> -march=armv7-a+nosimd+nofp
> or
> -march=armv7-a+nosimd+fp
> you should be able to determine if being more explicit about the fp and simd(neon) instructions used makes it behave.

If I compare gcc-10 vs gcc-11 I see:

malat@abel ~ % gcc-10 --verbose
Using built-in specs.
COLLECT_GCC=gcc-10
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/10/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian
10.3.0-16' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-10
--program-prefix=arm-linux-gnueabihf- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libitm --disable-libquadmath --disable-libquadmath-support
--enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions
--with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
--with-mode=thumb --disable-werror --enable-checking=release
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.0 (Debian 10.3.0-16)

while

malat@abel ~ % gcc-11 --verbose
Using built-in specs.
COLLECT_GCC=gcc-11
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/11/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian
11.3.0-3' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-11
--program-prefix=arm-linux-gnueabihf- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libitm --disable-libquadmath --disable-libquadmath-support
--enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions
--with-arch=armv7-a+fp --with-float=hard --with-mode=thumb
--disable-werror --enable-checking=release --build=arm-linux-gnueabihf
--host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Debian 11.3.0-3)

Could someone confirm, the spec file is accurate for Debian armhf (no
neon) ? I fail to understand why spec file would be different for us
(--with-arch=armv7-a --with-fpu=vfpv3-d16 suddenly became
--with-arch=armv7-a+fp).

If I read the doc online correctly:

https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

states:

-mfpu=name
[...]
The setting ‘auto’ is the default and is special. It causes the
compiler to select the floating-point and Advanced SIMD instructions
based on the settings of -mcpu and -march.

In the case of valgrind I can see:

` -marm -mcpu=cortex-a8`

I cannot find in the doc what 'cortex-a8' stands for: neon or not neon ?

> It seems likely that you have hit this problem.
> I think this is the same thing too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982794
> (Firefox dying with illegal instruction on non-neon hardware)

> I _suspect_ that debian needs to change the default flags to actually
> say 'armv7+fp+nosimd' by default so that we get what we expect (and
> define as the base ISA) and it doesn't depend on what hardware the
> build was done on.

Ah ! Now it starts to makes sense.

> Wookey
> --
> Principal hats:  Debian, Wookware, ARM
> http://wookware.org/


Reply to: