On 2017-05-10 09:17 +0200, Kay F. Jahnke wrote: > Hi group! > > I have code which optionally makes use of hardware vectorization. This is > done generically by using Vc: > > https://github.com/VcDevel/Vc > > When compiling with Vc, the resultant machine code is for a specific vector > unit only, like AVX or SSE. There are several possible ways of dealing with > these processor-dependent binaries: > > - create a set of complete target-specific executables and select which one > to deploy/run on the target machine > > - create a single binary with all variants linked in, calling only > target-specific code at run time > > - create a set of shared libraries, deploy one or all and load the > target-specific one at run-time > > - create only one compromise binary using some commonly available vector > unit > > The first alternative is nice because the binary is small and simple, but > the binary will only run on a specific target, so there would have to be a > way to do target-specific deployment, or, alternatively, a population of > additional superfluous binaries cluttering .../bin. So far, I have only seen > architecture-dependent packages, and I haven't managed to figure out if the > package installation process can be made more specific to deploy only code > for a specific vector unit. But I'd like to go along this path if possible. Debian requires packages to run on the base level ISA defined for each architecture (which does change slowly over time). I don't know what level of vectorisation that implies on other arches (perhaps SSE can be assumed on x86_64 or i386?), but on armel and armhf it assumes no vector unit (i.e you cannot assume that NEON instructions are present: there must be a runtime check before using them) On arm64 neon is part of the base spec so you can assume that it is present. (In practice almost no armel-using hardware, and the very large majority of armhf hardware will have neon.) There is (as yet) no mechanism in packing to select packages by hardware variant or optimisation. It has been mooted, and could be done, but it's a big job, which would take years to roll out, and no-one has stepped up to make it work. So for now your favourite mechanism is not possible. > The second alternative would require case-switching inside the code > The third alternative is [...] tearing the code apart into the > 'main' program and some library doing the number crunching. > The fourth alternative is to create a target using only > SSE instructions, which are available on most machines. Does this software only work on x86 or does it work on other architectures, with other vector units (neon, altivec)? Remember to consider more than just x86 when pondering this issue. If at all possible you should arrange for the software to work for all debian arches on the base spec. IT is obviously then highly worthwhile using hardware optimisations where available at runtime. Which method you use inside the codebase to cope with different hardware is up to you. Various libraries and mechanisms exist for this sort of optimisation-switching, such as ifunc in glibc. You don't say what language your codebase is in. I would agree with you that moving thise code into a library is a cleaner solution, but internal case-switching will also work fine. Use the HWCAPS mechanism to determine at runtime what vector unit, if any, is available. You are not the first person with this problem so there is probably some code already available for the checks and switching in your language. For arm there is the ne10 package for useful optimised neon functions, but it doesn't help with any other architectures, or the fallback/variant-switching part, but it may still be helpful. Wookey -- Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/
Attachment:
signature.asc
Description: Digital signature