Re: how best to package when using hardware vectorization with vector-unit specific code?
On 05/11/2017 09:33 AM, Kay F. Jahnke wrote:
> Or is there possibly even a ready-made solution
> just for the purpose?
Well, even if FMV doesn't work for you in your code due to the way it
is organized, you could definitely use it for dispatching the
executables.
To elaborate on that:
1) Install the actual binaries under
/usr/lib/packagename/executable.$VARIANT
(As others said elsewhere: don't create _all_ possible variants
that your code supports, just create those that make the most
sense. On amd64 that would probably be sse2 (part of base ISA),
sse4.2 and avx2.
2) Use the following program (not tested, just as an idea) to dispatch
to the actual programs you want to use:
__attribute__ ((target ("default")))
void run(char **argv)
{
#if defined(__amd64__)
execv("/usr/lib/packagename/executable.sse2", argv);
perror("Could not execute /usr/lib/packagename/executable.sse2");
#elif some other architecture with vector by default
execv("/usr/lib/packagename/executable.some_other_vector_isa", argv);
perror("Could not execute /usr/lib/packagename/executable.some_other_vector_isa");
#else
execv("/usr/lib/packagename/executable.nonvectorized", argv);
perror("Could not execute /usr/lib/packagename/executable.nonvectorized");
#endif
}
#if defined(__i386__)
__attribute__ ((target ("sse2")))
void run(char **argv)
{
execv("/usr/lib/packagename/executable.sse2", argv);
perror("Could not execute /usr/lib/packagename/executable.sse2");
}
#endif
#if defined(__amd64__)
__attribute__ ((target ("avx2")))
void run(char **argv)
{
execv("/usr/lib/packagename/executable.avx2", argv);
perror("Could not execute /usr/lib/packagename/executable.avx2");
}
#endif
#if defined(__arm__)
__attribute__ ((target ("fpu=neon-vfpv3")))
void run(char **argv)
{
execv("/usr/lib/packagename/executable.neon", argv);
perror("Could not execute /usr/lib/packagename/executable.neon");
}
#endif
int main(int, char **argv)
{
run(argv);
return 1;
}
This way you don't have to care about how to check for CPU flags,
the compiler will do it for you - and I believe the above structure
(or something very similar) is quite maintainable for the future.
(Also note that GCC 4.8 already supports this kind of FMV, the
GCC 6 addition was target_clones).
Regards,
Christian
Reply to: