how best to package when using hardware vectorization with vector-unit specific code?
Hi group!
I have code which optionally makes use of hardware vectorization. This
is done generically by using Vc:
https://github.com/VcDevel/Vc
When compiling with Vc, the resultant machine code is for a specific
vector unit only, like AVX or SSE. There are several possible ways of
dealing with these processor-dependent binaries:
- create a set of complete target-specific executables and select which
one to deploy/run on the target machine
- create a single binary with all variants linked in, calling only
target-specific code at run time
- create a set of shared libraries, deploy one or all and load the
target-specific one at run-time
- create only one compromise binary using some commonly available vector
unit
The first alternative is nice because the binary is small and simple,
but the binary will only run on a specific target, so there would have
to be a way to do target-specific deployment, or, alternatively, a
population of additional superfluous binaries cluttering .../bin. So
far, I have only seen architecture-dependent packages, and I haven't
managed to figure out if the package installation process can be made
more specific to deploy only code for a specific vector unit. But I'd
like to go along this path if possible.
The second alternative would require case-switching inside the code
(which has to be maintained for every new vector unit coming along),
makes the build more complex (would need to create a set of object files
with individually named versions of the code to be called, linking might
be difficult) - and it would bloat the binary code. Yet it would provide
a single binary useful for all targets, so packaging should be simpler.
The third alternative is also an interesting option, but it would
require tearing the code apart into the 'main' program and some library
doing the number crunching. The case-switching inside the 'main' code
would also require maintenance over time, and deploying all versions of
the .so would also be a waste of space.
The fourth alternative is often used to create a target using only SSE
instructions, which are available on most machines. Yet this sacrifices
the power of better vector units and makes performance on newer
processors suboptimal, so it's not really a good option.
I'd like some advice on how to proceed to get my code to be easily
packaged and deployed under the constraints I've outlined. If it helps
you to understand more clearly what this is all about, my project (a
viewer for panoramic images) is here:
https://bitbucket.org/kfj/pv
With regards
Kay F. Jahnke
Reply to: