Re: how best to package when using hardware vectorization with vector-unit specific code?

To: debian-mentors@lists.debian.org, "Kay F. Jahnke" <_kfj@yahoo.com>
Subject: Re: how best to package when using hardware vectorization with vector-unit specific code?
From: Wookey <wookey@wookware.org>
Date: Wed, 10 May 2017 10:52:39 +0100
Message-id: <[🔎] 20170510095239.GF27127@mail.wookware.org>
In-reply-to: <[🔎] da5b2a17-8e62-8339-476f-4da1fdf190d2@yahoo.com>
References: <[🔎] da5b2a17-8e62-8339-476f-4da1fdf190d2@yahoo.com>

On 2017-05-10 09:17 +0200, Kay F. Jahnke wrote:
> Hi group!
> 
> I have code which optionally makes use of hardware vectorization. This is
> done generically by using Vc:
> 
> https://github.com/VcDevel/Vc
> 
> When compiling with Vc, the resultant machine code is for a specific vector
> unit only, like AVX or SSE. There are several possible ways of dealing with
> these processor-dependent binaries:
> 
> - create a set of complete target-specific executables and select which one
> to deploy/run on the target machine
> 
> - create a single binary with all variants linked in, calling only
> target-specific code at run time
> 
> - create a set of shared libraries, deploy one or all and load the
> target-specific one at run-time
> 
> - create only one compromise binary using some commonly available vector
> unit
> 
> The first alternative is nice because the binary is small and simple, but
> the binary will only run on a specific target, so there would have to be a
> way to do target-specific deployment, or, alternatively, a population of
> additional superfluous binaries cluttering .../bin. So far, I have only seen
> architecture-dependent packages, and I haven't managed to figure out if the
> package installation process can be made more specific to deploy only code
> for a specific vector unit. But I'd like to go along this path if possible.

Debian requires packages to run on the base level ISA defined for each
architecture (which does change slowly over time). I don't know what
level of vectorisation that implies on other arches (perhaps SSE can
be assumed on x86_64 or i386?), but on armel and armhf it assumes
no vector unit (i.e you cannot assume that NEON instructions are
present: there must be a runtime check before using them) On arm64
neon is part of the base spec so you can assume that it is
present. (In practice almost no armel-using hardware, and the very
large majority of armhf hardware will have neon.)

There is (as yet) no mechanism in packing to select packages by
hardware variant or optimisation. It has been mooted, and could be
done, but it's a big job, which would take years to roll out, and
no-one has stepped up to make it work. So for now your favourite
mechanism is not possible.

> The second alternative would require case-switching inside the code

> The third alternative is [...] tearing the code apart into the
> 'main' program and some library doing the number crunching.

> The fourth alternative is to create a target using only
> SSE instructions, which are available on most machines.

Does this software only work on x86 or does it work on other
architectures, with other vector units (neon, altivec)? Remember to
consider more than just x86 when pondering this issue.

If at all possible you should arrange for the software to work for all
debian arches on the base spec. IT is obviously then highly worthwhile
using hardware optimisations where available at runtime.

Which method you use inside the codebase to cope with different
hardware is up to you. Various libraries and mechanisms exist for this
sort of optimisation-switching, such as ifunc in glibc. You don't say
what language your codebase is in.

I would agree with you that moving thise code into a library is a
cleaner solution, but internal case-switching will also work fine. Use
the HWCAPS mechanism to determine at runtime what vector unit, if any,
is available.

You are not the first person with this problem so there is probably
some code already available for the checks and switching in your
language. For arm there is the ne10 package for useful optimised neon
functions, but it doesn't help with any other architectures, or the
fallback/variant-switching part, but it may still be helpful.

Wookey
-- 
Principal hats:  Linaro, Debian, Wookware, ARM
http://wookware.org/

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: how best to package when using hardware vectorization with vector-unit specific code?
  - From: Christian Seiler <christian@iwakd.de>

References:
- how best to package when using hardware vectorization with vector-unit specific code?
  - From: "Kay F. Jahnke" <_kfj@yahoo.com>

Prev by Date: Bug#862245: marked as done (RFS: libhinawa/0.8.1-1)
Next by Date: Re: how best to package when using hardware vectorization with vector-unit specific code?
Previous by thread: Re: how best to package when using hardware vectorization with vector-unit specific code?
Next by thread: Re: how best to package when using hardware vectorization with vector-unit specific code?
Index(es):
- Date
- Thread