Re: how best to package when using hardware vectorization with vector-unit specific code?

To: debian-mentors@lists.debian.org
Cc: debian-mentors@lists.debian.org
Subject: Re: how best to package when using hardware vectorization with vector-unit specific code?
From: "Kay F. Jahnke" <_kfj@yahoo.com>
Date: Thu, 11 May 2017 09:33:37 +0200
Message-id: <[🔎] 33358dc8-8652-cf7c-772f-e210e58f2bfc@yahoo.com>
In-reply-to: <[🔎] 20170510174211.GM27127@mail.wookware.org>
References: <[🔎] da5b2a17-8e62-8339-476f-4da1fdf190d2@yahoo.com> <[🔎] 20170510095239.GF27127@mail.wookware.org> <[🔎] 39d8a65c-3ef4-1e19-49ee-05f09b544306@iwakd.de> <[🔎] bb141a61-83a0-57dc-731a-4833c36861ed@yahoo.com> <[🔎] 20170510174211.GM27127@mail.wookware.org>

Am 10.05.2017 um 19:42 schrieb Wookey:

On 2017-05-10 18:01 +0200, Kay F. Jahnke wrote:

#! /bin/bash

for instruction_set in mmx sse sse2 sse3 ssse3 sse4 sse4a sse4.1 sse4.2 avx
avx2 avx512f avx512pf avx512er avx512cd
do
  if [[ $( lscpu | grep $instruction_set ) ]]
  then
    bestarch=$instruction_set
  fi
done

Because it is install-time, not run-time, detection it would go wrong
in a range of circumstances, so is frowned-upon. (Installing images,
hardware which gets upgraded, keeping the OS image, cross-installing,
NFS-mounting, containers etc).

Okay, I did not think of that. Kind of a show-stopper for mysimple-minded plan.

But yes, it is possible in the absence of more correct solutions. It
would be much better to run such a 'choose-binary' script at runtime
and have it run the right one as that would work in all the
circumstances I can think of offhand.

So why don't I use a run-time chooser then? I am currently doing thatwith the shell script above, simply passing on all arguments to a callto myprogram_$bestarch. Of course this would have to be extended to bemore comprehensive, but it could always fall back on the scalar variantif it can't positively identify a friendly environment. Alternatively Icould have C++ code doing the job. What's better? Can I rely on aspecific shell to be present on all systems debian runs on, and onlscpu? Or is there possibly even a ready-made solution just for the purpose?

How fat would 15 versions of the program be (on x86)? Do you really
need all 15? Might a subset suffice.,

Not really 15, I think even four would be good enough - if the processordoesn't even have SSE it's a bit slow for that kind of applicationanyway, so I'd say at least SSE, AVX, and AVX2, plus the scalar versionas a runs-everywhere fallback. And the code itself is slim; I prefer tolink libVc.a in statically for performance reasons, but SFML and vigracan be linked dynamically. The binaries are ca. 1MB each.

Where should the architecture-dependent binaries go in the target's filesystem, to make sure they're not in the execution path accidentally?

Does this software only work on x86 or does it work on other
architectures, with other vector units (neon, altivec)? Remember to
consider more than just x86 when pondering this issue.


I am using Vc, so whatever Vc supports, my software supports as well. Vc is
a generic C++ library to abstract away the architecture.  I've coded so that
my program will also run without using the vector units


OK. Looks like neon support is 'in development'. And you can run on
non-vectorised hardware (but only very slowly).

In fact non-vectorized performance isn't all that bad, the program isvery memory-bound with lots of DDA and irregular, possibly widelyscattered memory access patterns. Vectorization speeds up the processingpipelines only - AVX2 roughly halves my rendering times.

Kay

Reply to:

Follow-Ups:
- Re: how best to package when using hardware vectorization with vector-unit specific code?
  - From: Christian Seiler <christian@iwakd.de>

References:
- how best to package when using hardware vectorization with vector-unit specific code?
  - From: "Kay F. Jahnke" <_kfj@yahoo.com>
- Re: how best to package when using hardware vectorization with vector-unit specific code?
  - From: Wookey <wookey@wookware.org>
- Re: how best to package when using hardware vectorization with vector-unit specific code?
  - From: Christian Seiler <christian@iwakd.de>
- Re: how best to package when using hardware vectorization with vector-unit specific code?
  - From: "Kay F. Jahnke" <_kfj@yahoo.com>
- Re: how best to package when using hardware vectorization with vector-unit specific code?
  - From: Wookey <wookey@wookware.org>

Prev by Date: Re: how best to package when using hardware vectorization with vector-unit specific code?
Next by Date: Re: how best to package when using hardware vectorization with vector-unit specific code?
Previous by thread: Re: how best to package when using hardware vectorization with vector-unit specific code?
Next by thread: Re: how best to package when using hardware vectorization with vector-unit specific code?
Index(es):
- Date
- Thread