[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Getting rid of alignment faults in userspace



Hi all,

I've recently become aware that a few packages are causing alignment
faults on ARM, and are relying on the alignment fixup emulation code in
the kernel in order to work.

Such faults are very expensive in terms of CPU cycles, and can generally
only result from wrong code (for example, C/C++ code which violates the
relevant language standards, assembler which makes invalid assumptions,
or functions called with misaligned pointers due to other bugs).

Currently, on a natty Ubuntu desktop image I observe no faults except
from firefox and mono-based apps (see below).

As part of the general effort to make open source on ARM better, I think 
it would be great if we can disable the alignment fixups (or at least
enable logging) and work with upstreams to get the affected packages
fixed.

For release images we might want to be more forgiving, but for development
we have the option of being more aggressive.

The number of affected packages and bugs appears small enough for the
fixing effort to be feasible, without temporarily breaking whole
distros.


For ARM, we can achieve the goal by augmenting the default kernel command-
line options: either

    alignment=3
        Fix up each alingment fault, but also log the faulting address
        and name of the offending process to dmesg.

    alignment=5
        Pass each alignment fault to the user process as SIGBUS (fatal
        by default) and log the faulting address and name of the
        offending process to dmesg.

Fault statistics cat also be obtained at runtime by reading
/proc/cpu/alignment.

For other architectures, there may be other arch-specific ways of
achieving something similar.

I'd be interested in people's views on this.

Cheers
---Dave


More background:

Two known instances of misbehaving userland apps are:

    1)  firefox-4.x (bug report pending)

        A char array declared as a container for C++ objects is cast
        directly to an object pointer type and deferenced, without
        ensuring proper alignment.

        By sheer luck, the presence of an extra member in the containing
        class in firefox-3.x means that the char array has a different
        alignment and so the faults don't occur.

    2)  gtk-sharp2 (https://bugs.launchpad.net/bugs/798315) (affecting
        mono-based GUI apps such as banshee and tomboy)

        char pointers are cast to 64-bit integer pointers and
        deferenced, as an attempt at comparing string prefixes faster.

These apps typically generate hundreds or thousands of faults per session,
but not millions, but it's still quite a lot of noise in syslog.

I think these are likely to be representative of typical causes of
alignment faults: i.e., attempted optimisations which break the rules of
the language, and which only show in certain builds, or as side-effects
of routine maintenance.

Code like that is going to be a massive own goal for performance on ARM and 
other architectures which fault unaligned accesses, since the resulting faults
are likely to cost thousands of cycles per instance.


Reply to: