[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Getting rid of alignment faults in userspace



On Sat, Jun 18, 2011 at 03:17:59PM -0400, Nicolas Pitre wrote:
> On Sat, 18 Jun 2011, Arnaud Patard wrote:
> 
> > Dave Martin <dave.martin@linaro.org> writes:
> > Hi,
> > 
> > > Hi all,
> > >
> > > I've recently become aware that a few packages are causing alignment
> > > faults on ARM, and are relying on the alignment fixup emulation code in
> > > the kernel in order to work.
> > >
> > > Such faults are very expensive in terms of CPU cycles, and can generally
> > > only result from wrong code (for example, C/C++ code which violates the
> > > relevant language standards, assembler which makes invalid assumptions,
> > > or functions called with misaligned pointers due to other bugs).
> > >
> > > Currently, on a natty Ubuntu desktop image I observe no faults except
> > > from firefox and mono-based apps (see below).
> > >
> > > As part of the general effort to make open source on ARM better, I think 
> > > it would be great if we can disable the alignment fixups (or at least
> > > enable logging) and work with upstreams to get the affected packages
> > > fixed.
> > >
> > > For release images we might want to be more forgiving, but for development
> > > we have the option of being more aggressive.
> > >
> > > The number of affected packages and bugs appears small enough for the
> > > fixing effort to be feasible, without temporarily breaking whole
> > > distros.
> > >
> > >
> > > For ARM, we can achieve the goal by augmenting the default kernel command-
> > > line options: either
> > >
> > >     alignment=3
> > >         Fix up each alingment fault, but also log the faulting address
> > >         and name of the offending process to dmesg.
> > >
> > >     alignment=5
> > >         Pass each alignment fault to the user process as SIGBUS (fatal
> > >         by default) and log the faulting address and name of the
> > >         offending process to dmesg.
> > 
> > iirc, someone sent some months/years ago a patch to change the default
> 
> That was me.
> 
> > but it has been rejected because there are (was ?) some libc including
> > glibc doing some unaligned access [1], and this can happen early in the
> > boot process. In this kind of case, things like getting a sigbus would
> > hurt.
> 
> This is only partly true.
> 
> Rewind about 15 years ago when all that Linux supported was ARMv3.  On 
> ARMv3 there is no instruction for doing half-word loads/stores, and no 
> instruction to sign extend a loaded byte.
> 
> In those days, the compiler was relying on a documented and 
> architecturally defined behavior of misaligned loads/stores which is to 
> rotate the bytes comprising the otherwise aligned word, the rotation 
> position being defined by the sub-word offset.  Doing so allowed for 
> certain optimizations to avoid extra shifts and masks.
> 
> Then a bunch of binaries were built with a version of GCC making use of 
> those misaligned access tricks.
> 
> Then came along ARMv4 with its LDRH, LDRSH, and LDRSB instructions, 
> making those misaligned tricks unnecessary.  Hence GCC deprecated those 
> optimizations.  Today only the old farts amongst us still remember about 
> this.
> 
> So for quite a while now, having a misaligned access on ARM before ARMv6 
> is quite likely to not produce the commonly expected result.  That's why 
> there is code in the kernel to trap and fix up misaligned accesses.  
> However, it is turned off by default for user space.  Why?
> 
> Turns out that a prominent ARM developer still has binaries from the 
> ARMv3 era around, and the default of not fixing up misaligned user space 
> accesses is for remaining compatible with them.

The default /proc/cpu/alignment mode seems to be 2 (fixup), on v6/v7,
priovided that the v6 unaligned access model (CR_U) is supported by the CPU:

arch/arm/mm/alignment.c:

        if (cpu_architecture() >= CPU_ARCH_ARMv6 && (cr_alignment & CR_U)) {
                cr_alignment &= ~CR_A;
                cr_no_alignment &= ~CR_A;
                set_cr(cr_alignment);
                ai_usermode = UM_FIXUP;
        }

This suggests that by default, ancient binaries will actually silently
misbehave when running on a v6 or later CPU.

> So if you do have a version of glibc that is not from 15 years ago (that 
> would have to be a.out and not ELF if it was) then you do not want to 
> let misaligned accesses go through unfixed, otherwise you'll simply have 
> latent data corruption somewhere.

Note that if we enable SIGBUS instead of fixing up, this is "safe" in
the sense of preferring a fatal signal to incorrect results.

In the linaro/ubuntu/armhf context, I think we'd have few things to fix,
but for debian armel, there is likely to be much more alignment faulting
and there might be too much software to fix for this to be easily achieved.

> > Also, as noted by someone else in the thread, you do want to test on
> > something like armv5* or v4* because there are high chances than the
> > trap used by the alignment fix won't be triggered at all on >= armv6.
> 
> Given that Linaro is working only with Thumb2-compiled  user space, that 
> implies ARMv6 and above only.

Note that debian-arm is on CC -- this argument applies to the armhf port
under development, since this targets v7+.

For the Debian armel port though, the pros and cons are somewhat
different since this distro may run on v4/v5.

Cheers
---Dave

> > [1] See commit log of commit d944d549aa86e08cba080396513234cf048fee1f.
> 
> And note the "if not fixed up, results in segfaults" in that log, 
> meaning that the current default is wrong for that case.
> 
> 
> Nicolas


Reply to: