[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86



Package: libc6
Version: 2.19-0experimental0
Severity: grave
Justification: causes non-serious data loss

libpthread-2.19 has HLE (hardware-assisted lock elision) support.
Unfortunately, on Intel-based x86 processors, the use of HLE is currently
hazardous.

Summary:  Use of HLE on all current Intel Haswell processors (the only x86
processors with HLE support so far) can cause unpredictable system
behaviour, including the possibility of hangs and memory corruption.
Updating the microcode on these Intel Haswell processors when Intel TSX is
in use by libpthread will cause running processes linked to libpthread to be
killed with SIGILL.

This issue is, AFAIK, impossible to work around in the kernel.  Since glibc
uses the cpuid instruction directly, the kernel cannot prevent libpthreads
from attempting to use Intel TSX.

Non-free will work around the microcode update issue by enforcing that all
microcode updates be done in the initramfs (i.e. require a reboot to apply,
and require initramfs).

Unfortunately, this is not going to be enough as most users don't have
intel-microcode installed in their Intel-based systems, and therefore would
still be at risk of data loss or data corruption due to erratum HSD136.

Please disable hardware-assisted lock elision (HLE) on X86/X86-64 Intel
Haswell Processors in libpthreads.


Details:


On unpatched Intel processors, HLE will hit erratum HSD136:

HSD136.  Software Using Intel® TSX May Result in Unpredictable System
         Behavior

Problem: Under a complex set of internal timing condit ions and system
	 events, software using the Intel TSX (Transactional Synchronization
	 Ex tensions) instructions may result in unpredictable system
	 behavior.

(Erratum description from: "Desktop 4th Generation Intel Core Processor Family
Specification Update, June 2013, #328899-001).

This erratum is serious enough for Intel to take the PR hit and withdraw the
feature on all Haswell cores, including the just-launched Haswell-EP E5v3
Xeons.  (ref:
http://www.anandtech.com/show/8376/intel-disables-tsx-instructions-erratum-found-in-haswell-haswelleep-broadwelly
).

On patched Intel processors, Intel TSX will be disabled by the microcode.
When disabled, any Intel TSX instructions will generate an illegal opcode
trap.  Intel TSX support supposedly can be re-enabled *during system boot*
by the UEFI firmware through an undisclosed method.

Unfortunately, the act of updating the microcode will immediately disable
Intel TSX, causing all running processors linked to libpthread-2.19 to trap
and crash with SIGILL:

[ 43.606830] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.608466] microcode: CPU0 updated to revision 0x1c, date = 2014-07-03
[ 43.608494] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.609327] microcode: CPU1 updated to revision 0x1c, date = 2014-07-03
[ 43.609352] do_trap: 267 callbacks suppressed
[ 43.609354] traps: rs:main Q:Reg[1343] trap invalid opcode ip:7f32abd0b7ab
sp:7f32a9062848 error:0
[ 43.609355] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.609358] in libpthread-2.19.so[7f32abcfa000+18000]
[ 43.610204] microcode: CPU2 updated to revision 0x1c, date = 2014-07-03
[ 43.610225] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.611081] microcode: CPU3 updated to revision 0x1c, date = 2014-07-03
[ 43.611507] traps: systemd[1] trap invalid opcode ip:7f844f84a7ab
sp:7fff2ccf7e28 error:0 in libpthread-2.19.so[7f844f839000+18000]
[...]

Ref: https://bugs.launchpad.net/intel/+bug/1370352

It is unknown at this time what will happen on future microcode updates.  It
is entirely possible that the act of updating the microcode will always
reset Intel TSX to its default "disabled" state, regardless of whether the
BIOS had force-enabled it or not at boot.   This is the reason why I will
drop support for microcode updates outside of the initramfs in non-free.


Therefore, due to erratum HSD136 and the lack of widespread use of microcode
updates, libpthread-2.19 must stop using HLE on the problematic Intel
processors.

Here's the data required for the blacklist:

CPUID signature : family : model : stepping
0x000306fZ      :   6    :  63   : Z <= 2
0x000306cZ      :   6    :  60   : Z <= 3
0x0004065Z      :   6    :  69   : Z <= 1
0x0004066Z      :   6    :  70   : Z <= 1

Note: this list is not likely to be complete.  Some Engineering Sample
signatures may be missing, as well as other Haswell processor signatures we
don't know about.

You may want to consider blacklisting HLE on all Intel processors (not just
the processors above) until we are sure we know about the cpuid signature of
all processors that need blacklisting.


[1] Haswell/Haswell-E/Haswell-EP processors running with the following
    microcode installed, or any later revision:

    sig 0x000306f2, 2014-09-03, rev 0x0029
    sig 0x000306c3, 2014-07-03, rev 0x001c
    sig 0x00040651, 2014-07-03, rev 0x001c
    sig 0x00040661, 2014-07-03, rev 0x0012

    This list is likely incomplete.

-- System Information:
Debian Release: 7.6
  APT prefers proposed-updates
  APT policy: (990, 'proposed-updates'), (990, 'stable'), (500, 'stable-updates')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.10.55+ (SMP w/8 CPU cores)
Locale: LANG=pt_BR.UTF-8, LC_CTYPE=pt_BR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: