Re: is there some known regression with gcc on armel?
On Thu, Aug 19, 2021 at 9:03 AM Luca Olivetti <firstname.lastname@example.org> wrote:
> I upgraded a test machine (buffalo linkstation pro/Marvell Orion5x) from
> buster to bullseye, then I rebuilt the dspam[*] deb (since it's been
> dropped by debian since buster, I did the same there).
> The newly built binary doesn't start, complaining of a configuration
> error, the binary built with buster still works.
> I traced the execution with gdb and it doesn't make sense: a function is
> called with a pointer to the configuration struct but inside the
> function it is null.
> I then recompiled it with -O0 instead of the default -O2 and this time
> it works (will try later with -O1).
> I'm not very familiar with gcc, so do you know of any regression with
> gcc optimizations on armel (maybe with some other package needing
> special optimization options)?
> Note that I had to add the "-z muldefs" option to the linker, but I
> don't think that's the problem (the null I saw wasn't a global variable
> but a parameter).
> [*] I know it's a dead project, but it still works and it is
> surprisingly effective and lightweight on such an under powered machine.
The old Marvell CPUs have a known bug when processing the ldrd/strd
instructions on misaligned pointers, which leads to incorrect data
instead of trapping into the kernel.
This only happens for incorrect source code that relies on undefined
behavior, accessing a pointer to a 64-bit 'long long' variable that is
not naturally aligned. This happens to work on most CPUs including
all x86 and armv6+, and mostly works on armv5 because the kernel
works around the undefined behavior by fixing up the load in an
openwrt actually carried a patch against gcc for this in the past,
though with a misleading description (this is only a bug on Marvell
CPUs, not ARM926, and gcc doesn't seem to do anything it
shouldn't be allowed for correct source code).
To confirm that this is the actual problem, can you try building the
package using '-O2 -march=armv4t' or '-O2 -march=armv5t' to
override the default 'armv5te'?
Regarding the question why this showed up now, I can only guess,
probably a combination of multiple factors:
- CPU architectures such as ARMv5 without native unaligned
access are much less common than they used to be, as
the industry is converging on x86/armv6+/riscv, so bugs in
application source code don't get found as quickly as they
- Any armel binaries from before the Debian Buster release were
built for ARMv4T rather than ARMv5TE, so they did not use ldrd/strd
- Newer GCC versions tend to find better optimizations, so they
may use LDRD/STRD when old versions did not.