[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SIMDebian: Debian Partial Fork with Radical ISA Baseline



Hi Guillem,

Thanks for your helpful pointers.

On Sat, Apr 06, 2019 at 10:55:35PM +0200, Guillem Jover wrote:
> If what you are interested in though is just a small subset of the
> archive, another option that would benefit everyone and is perhaps
> less cumbersome than having to jugle around with multiple archives
> and package rebuilds/variants, is to make use of libc's hwcaps [H]
> support, which means the dynamic linker will automatically load the
> best optimized shared object for the current hardware. This of course
> can complicate a bit the packaging, and bloat it, but if the performance
> improvement is substantial, it might be a very good trade-off.
>   [H] man ld.so "NOTES" / "Hardware capabilities"

This sounds like a nice feature. However, unfortunately, the "avx2" and
"avx512" features I wanted didn't show up in the list... IIRC in my
original post I presented a C++ example with Eigen (a header-only
library). Reverse deps such as TensorFlow would benefit from this HWCAPS
feature if ld.so supported amd64's avx2 and avx512.
 
> Another option which requires upstream code changes (and ideally them
> being complicit) is to add run-time selection for the more suitable
> optimized functions, for example via the __target__ and __ifunc__ [I]
> function __attribute__ (and __builtin_cpu_supports or __builtin_cpu_is),
> or the __target_clone__ function __attribute__. Perhaps also of
> interest is the __simd__ function __attribute__.
> 
>   [I] info gcc "Function Attributes";
>       <https://sourceware.org/glibc/wiki/GNU_IFUNC>

This compiler feature (which has been considered in the past) is a quite
good solution for small projects.  However this is not easy to enforce for
projects like TensorFlow ...


Reply to: