Am 10.05.2017 um 19:42 schrieb Wookey:
On 2017-05-10 18:01 +0200, Kay F. Jahnke wrote:#! /bin/bash for instruction_set in mmx sse sse2 sse3 ssse3 sse4 sse4a sse4.1 sse4.2 avx avx2 avx512f avx512pf avx512er avx512cd do if [[ $( lscpu | grep $instruction_set ) ]] then bestarch=$instruction_set fi done
Because it is install-time, not run-time, detection it would go wrong in a range of circumstances, so is frowned-upon. (Installing images, hardware which gets upgraded, keeping the OS image, cross-installing, NFS-mounting, containers etc).
Okay, I did not think of that. Kind of a show-stopper for my simple-minded plan.
But yes, it is possible in the absence of more correct solutions. It would be much better to run such a 'choose-binary' script at runtime and have it run the right one as that would work in all the circumstances I can think of offhand.
So why don't I use a run-time chooser then? I am currently doing that with the shell script above, simply passing on all arguments to a call to myprogram_$bestarch. Of course this would have to be extended to be more comprehensive, but it could always fall back on the scalar variant if it can't positively identify a friendly environment. Alternatively I could have C++ code doing the job. What's better? Can I rely on a specific shell to be present on all systems debian runs on, and on lscpu? Or is there possibly even a ready-made solution just for the purpose?
How fat would 15 versions of the program be (on x86)? Do you really need all 15? Might a subset suffice.,
Not really 15, I think even four would be good enough - if the processor doesn't even have SSE it's a bit slow for that kind of application anyway, so I'd say at least SSE, AVX, and AVX2, plus the scalar version as a runs-everywhere fallback. And the code itself is slim; I prefer to link libVc.a in statically for performance reasons, but SFML and vigra can be linked dynamically. The binaries are ca. 1MB each.
Where should the architecture-dependent binaries go in the target's file system, to make sure they're not in the execution path accidentally?
Does this software only work on x86 or does it work on other architectures, with other vector units (neon, altivec)? Remember to consider more than just x86 when pondering this issue.I am using Vc, so whatever Vc supports, my software supports as well. Vc is a generic C++ library to abstract away the architecture. I've coded so that my program will also run without using the vector unitsOK. Looks like neon support is 'in development'. And you can run on non-vectorised hardware (but only very slowly).
In fact non-vectorized performance isn't all that bad, the program is very memory-bound with lots of DDA and irregular, possibly widely scattered memory access patterns. Vectorization speeds up the processing pipelines only - AVX2 roughly halves my rendering times.
Kay