[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: shapeit4 and AVX2



Hi Michael,

Michael R. Crusoe, on 2020-11-09 17:29:11 +0100:
> On Mon, 9 Nov 2020 at 17:21, Étienne Mollier <etienne.mollier@mailoo.org>
> wrote:
> > I'm filling a wishlist item in the bug tracker, so that the
> > discussion does not disappear inside mail archives.  I gave a
> > try to shapeit4 autopkgtest suite with and without FMA & AVX2
> > support, but it had a run time of 1m25s in both cases on my
> > machine (Ryzen 5 3600 w/ 6 cores).  It is quite possible I
> > neglected some other bottlenecks though, but the assembler did
> > embed AVX2 instructions when I checked the build result.  Out of
> > curiosity, has someone figures on the performance gain for that
> > software when extensions are available?
> >
> > Michael R. Crusoe, on 2020-11-05 21:26:30 +0100:
> > > As documented at
> > > https://wiki.debian.org/SIMDEverywhere
> >
> > shapeit4 provides a dedicated code path for "-mfma -mavx2" build
> > options, and another one for generic builds.  Is it still worth
> > using SIMDe in this particular situation?  The "use case"
> > paragraph of the wiki page seems to suggest it is not strictly
> > needed here.
> >
> 
> Given the fallback route that doesn't use SIMD, then implementing our own
> is not necessary, however compiling the FMA+AVX2 path using SIMDe on
> non-x86 archs may result in a speedup for them.

Okay, thanks for your thoughts; IIUC, given enough instructions,
the compiler may see fit some code paths, otherwise not possible
with a strict baseline.

> Would be best to get a bigger training dataset to confirm the benefit, or
> at least the lack of regression :-)

Agreed, besides, such a change may actually put quite some
entropy in the package.  I'm holding my horses until I have an
idea of the order of magnitude of the gain.

> If a performance benefit is observed, it might be interesting to see if the
> AVX-only and "lower" SIMD levels on x86 also experience a speed up.

Yep, if we can have a typical enough workload to quantify the
gain, that would be of great help to see which optimisations are
worth it.

Kind Regards,
-- 
Étienne Mollier <etienne.mollier@mailoo.org>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/3, please excuse my verbosity.

Attachment: signature.asc
Description: PGP signature


Reply to: