[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: shapeit4 and AVX2



Hi all,

I agree that most CPUs used for scientific computing these days support AVX2. However, that is going to change. For instance, Apple is switching to ARM based CPUs. MacBooks are quite popular among bioinformaticians so so any tool author would want to support non-x86 CPUs. I also believe that other manufacturers will follow Apples lead to Arm.

That said, how can one support different instruction sets and get the optimal performance? Well, depending on the code, it can be simple or hard.

If there is just one function which needs the boost from the instruction set, one can simply add an __attribute__((target_clones("…"))) [1]

However, one can also compile different sections of the code with different optimizations and then use __builtin_cpu_supports() to decide at runtime which version should be used. I used that strategy with ifuncs for phylonium [2]. This way is more difficult and I haven't seen a good writeup of it yet.

Unfortunately both ways require quite a bit work from the upstream author. If someone knows a better way, let me know.

Best,
Fabian


1: https://lwn.net/Articles/691932/
2: https://salsa.debian.org/med-team/phylonium



On 05.11.20 15:02, Dylan Aïssi wrote:
Dear Giulio,

I am CCing the public Debian Med mailing list and Michael Crusoe who
can help to improve the Debian package in order to have an AVX2 binary
optimized.

Le jeu. 5 nov. 2020 à 15:20, Giulio Genovese
<giulio.genovese@gmail.com> a écrit :

I have noticed you are the author of the following patch for the shapeit4 debian package:
https://salsa.debian.org/med-team/shapeit4/-/blob/master/debian/patches/use_shared_libs.patch

I noticed that this causes shapeit4 to not use the AVX2 instruction set:
-CXXFLAG=-O3 -mavx2 -mfma
+#CXXFLAG=-O3 -mavx2 -mfma
  #Portable version without avx2 (much slower)
-#CXXFLAG=-O3
-LDFLAG=-O3
+CXXFLAG=$(CPPFLAGS) $(CXXFLAGS) -O3
+LDFLAG=$(LDFLAGS) -O3

This causes shapeit4 to run significantly slower than it could on most systems where this package would be installed.

I assume that the reason for this is that older CPUs do not support AVX2. Is it really important though to support these old systems? I doubt anybody would run such a tool on old machines. However, there might be reasons for this decision that I might be missing so I am curious to know the goals of this modification.


You are right, this is the Debian policy to provide binaries that
respect the architecture baseline [1] to support older CPUs. And you
are also right when you said nobody uses this tool on this kind of
outdated CPU. Personally, I don't have time to work on that but I
guess someone else in the Debian Med team would be interested?

Best,
Dylan

[1] https://wiki.debian.org/ArchitectureSpecificsMemo#amd64



Reply to: