[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

how best to package when using hardware vectorization with vector-unit specific code?



Hi group!

I have code which optionally makes use of hardware vectorization. This is done generically by using Vc:

https://github.com/VcDevel/Vc

When compiling with Vc, the resultant machine code is for a specific vector unit only, like AVX or SSE. There are several possible ways of dealing with these processor-dependent binaries:

- create a set of complete target-specific executables and select which one to deploy/run on the target machine

- create a single binary with all variants linked in, calling only target-specific code at run time

- create a set of shared libraries, deploy one or all and load the target-specific one at run-time

- create only one compromise binary using some commonly available vector unit

The first alternative is nice because the binary is small and simple, but the binary will only run on a specific target, so there would have to be a way to do target-specific deployment, or, alternatively, a population of additional superfluous binaries cluttering .../bin. So far, I have only seen architecture-dependent packages, and I haven't managed to figure out if the package installation process can be made more specific to deploy only code for a specific vector unit. But I'd like to go along this path if possible.

The second alternative would require case-switching inside the code (which has to be maintained for every new vector unit coming along), makes the build more complex (would need to create a set of object files with individually named versions of the code to be called, linking might be difficult) - and it would bloat the binary code. Yet it would provide a single binary useful for all targets, so packaging should be simpler.

The third alternative is also an interesting option, but it would require tearing the code apart into the 'main' program and some library doing the number crunching. The case-switching inside the 'main' code would also require maintenance over time, and deploying all versions of the .so would also be a waste of space.

The fourth alternative is often used to create a target using only SSE instructions, which are available on most machines. Yet this sacrifices the power of better vector units and makes performance on newer processors suboptimal, so it's not really a good option.

I'd like some advice on how to proceed to get my code to be easily packaged and deployed under the constraints I've outlined. If it helps you to understand more clearly what this is all about, my project (a viewer for panoramic images) is here:

https://bitbucket.org/kfj/pv

With regards

Kay F. Jahnke


Reply to: