[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#800574: Final analysis for Broadwell



On Sun, 18 Oct 2015, Aurelien Jarno wrote:
> On 2015-10-07 07:32, Henrique de Moraes Holschuh wrote:
> > Meanwhile, a suggestion by Samuel Thibault to try to use hwcap did
> > provide for a possible long-term plan to fine-tune the lock-elision
> > blacklist (and anything else of that sort).
> > 
> > We would have to (finally) extend x86-64 hwcap to cpuid(1) fully, and
> > also at least cpuid(7), which is anything but trivial and a lot of work.
> >  This is _not_ worth the trouble if it is done just for lock elision
> > blacklisting purposes.
> > 
> > However, it would be useful for link-time optimization in libraries
> > (e.g. avx2 flavours of something that really benefits from it, etc), so
> > it is likely worth pursuing... but only if we get buy-in from upstream.
> 
> Why do you believe that hwcap is better for handling that than the
> current STT_GNU_IFUNC mechanism?

I was not aware of STT_GNU_IFUNC.  I will look into it.

> 5) Finally it means that we need to provide a version of the libc for
> all combinations. Think on i386, we would need to provide:
>  - libc6
>  - libc6-i686
>  - libc6-i686-tsx
>  - libc6-xen
>  - libc6-xen-tsx

No.  We need nothing of the sort for Intel TSX.  TSX-NI is something
already detected at runtime by glibc using the cpuid instruction, there
is no need to use the dynamic loader's hwcap object selection for this.

What I proposed was to extend the kernel-supplied hwcap area for x86-64
(and x32, I suppose) to export the full flags information returned by
CPUID.EAX=1, and also by CPUID.EAX=7 to all processes... and use _that_
instead of a direct call to the cpuid instruction to detect Intel TSX
(and anything else based on cpuid(1) and cpuid(7) in glibc).

We could do it for 32-bit too, I suppose.  But if hardware-assisted lock
elision is important enough to justify that kind of work for i686, that
just means we should deploy x32 instead, as far as I'm concerned.

Then, change glibc to use this extended hwcap information to detect such
runtime-selected features instead of calling the cpuid instruction
directly on the processor.  On an older kernel without the extended
hwcap fields, either call cpuid directly, or disable them.

However, for stuff like AVX512, you might want to have the *entire*
library compiled with a much more advanced instruction set (based on the
fact that AVX512 being available also implies that very fast SSE4.2 is
available, for example).  It would be possible to use the dynamic
linker's hwcap support to do that, if one wanted to.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: