[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#800574: Final analysis for Broadwell



tag 800574 + patch
thanks

Attached updated version of amd64/local-blacklist-on-TSX-Haswell.diff.  
I believe it should be renamed to
"amd64/local-blacklist-for-Intel-TSX.diff" as it is not just about Intel
Haswell anymore.

The updated patch has been package-compile-tested on glibc 2.19-22.

This new version of the blacklist patch had the patch header text and
blacklist code comments updated.   It doesn't change anything for
Haswell.  It adds to the blacklist the current Broadwell CPU models and
steppings.

Broadwell-H with a very recent microcode update (rev 0x12, from
2015-06-04) was confirmed to have broken TSX-NI (RTM) and to _leave it
enabled_ in CPUID, causing glibc with lock elision enabled to SIGSEGV. 
An even more recent Broadwell-H microcode update, rev 0x13 from
2015-08-03, is confirmed to (finally) disable the HLE and RTM CPUID
bits.  This should make blacklisting signature 0x40671 uncontroversial.

Refer to https://bugzilla.kernel.org/show_bug.cgi?id=103351 for details.

This version of the blacklist patch leaves upcoming Broadwell-E
unblacklisted.  It also leaves Skylake unblacklisted, as I have not been
able to confirm whether the newest Skylake-S microcode updates have
working Intel TSX-NI, or have it disabled.

I propose that the updated blacklist patch be added to glibc in
unstable, and after it spends a few weeks in testing, that it should
also be the added to stable through a stable update.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique de Moraes Holschuh <hmh@debian.org>
Intel TSX is broken on Haswell based processors (erratum HSD136/HSW136)
and a microcode update is available to simply disable the corresponding
instructions.

A live microcode update will disable the TSX instructions causing
already started binaries to segfault. This patch simply disable Intel
TSX (HLE and RTM) on processors which might receive a microcode update,
so that it doesn't happen.  We might expect newer steppings to fix the
issue (e.g. as Haswell-EX did).

Intel TSX-NI is also broken on Broadwell systems, and documented as
being unavailable in their specification updates errata list.  However,
some end-user systems were shipped with old microcode that left Intel
TSX-NI still enabled in CPUID on these processors.  We must not allow
RTM to be used by glibc on these systems, due to runtime system
misbehavior and live-update of microcode hazards.

Author: Henrique de Moraes Holschuh <hmh@debian.org>

Index: glibc-2.19/sysdeps/x86_64/multiarch/init-arch.c
===================================================================
--- glibc-2.19.orig/sysdeps/x86_64/multiarch/init-arch.c	2014-02-07 07:04:38.000000000 -0200
+++ glibc-2.19/sysdeps/x86_64/multiarch/init-arch.c	2015-10-07 09:07:59.272156212 -0300
@@ -26,7 +26,7 @@
 
 
 static void
-get_common_indeces (unsigned int *family, unsigned int *model)
+get_common_indeces (unsigned int *family, unsigned int *model, unsigned int *stepping)
 {
   __cpuid (1, __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax,
 	   __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ebx,
@@ -36,6 +36,7 @@
   unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
   *family = (eax >> 8) & 0x0f;
   *model = (eax >> 4) & 0x0f;
+  *stepping = eax & 0x0f;
 }
 
 
@@ -47,6 +48,7 @@
   unsigned int edx;
   unsigned int family = 0;
   unsigned int model = 0;
+  unsigned int stepping = 0;
   enum cpu_features_kind kind;
 
   __cpuid (0, __cpu_features.max_cpuid, ebx, ecx, edx);
@@ -56,7 +58,7 @@
     {
       kind = arch_kind_intel;
 
-      get_common_indeces (&family, &model);
+      get_common_indeces (&family, &model, &stepping);
 
       unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
       unsigned int extended_family = (eax >> 20) & 0xff;
@@ -131,7 +133,7 @@
     {
       kind = arch_kind_amd;
 
-      get_common_indeces (&family, &model);
+      get_common_indeces (&family, &model, &stepping);
 
       ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
 
@@ -176,6 +178,24 @@
 	}
     }
 
+  /* Disable Intel TSX (HLE and RTM) due to erratum HSD136/HSW136
+     on all Haswell processors, except Haswell-EX/Xeon E7-v3 (306F4),
+     to work around outdated microcode that doesn't disable the
+     broken feature by default.
+
+     Disable TSX on Broadwell, due to errata BDM53/BDW51/BDD51/
+     BDE42.  The errata documentation states that RTM is unusable,
+     and that it should not be advertised by CPUID at all on any
+     such processors.  Unfortunately, it _is_ advertised in some
+     (older) microcode versions.  Exceptions: Broadwell-E (406Fx),
+     likely already fixed at launch */
+  if (kind == arch_kind_intel && family == 6 &&
+      ((model == 63 && stepping <= 2) || (model == 60 && stepping <= 3) ||
+       (model == 69 && stepping <= 1) || (model == 70 && stepping <= 1) ||
+       (model == 61 && stepping <= 4) || (model == 71 && stepping <= 1) ||
+       (model == 86 && stepping <= 2) ))
+    __cpu_features.cpuid[COMMON_CPUID_INDEX_7].ebx &= ~(bit_RTM | bit_HLE);
+
   __cpu_features.family = family;
   __cpu_features.model = model;
   atomic_write_barrier ();
Index: glibc-2.19/sysdeps/x86_64/multiarch/init-arch.h
===================================================================
--- glibc-2.19.orig/sysdeps/x86_64/multiarch/init-arch.h	2014-02-07 07:04:38.000000000 -0200
+++ glibc-2.19/sysdeps/x86_64/multiarch/init-arch.h	2015-10-06 09:43:18.000000000 -0300
@@ -40,6 +40,7 @@
 
 /* COMMON_CPUID_INDEX_7.  */
 #define bit_RTM		(1 << 11)
+#define bit_HLE		(1 << 4)
 
 /* XCR0 Feature flags.  */
 #define bit_XMM_state  (1 << 1)

Reply to: