[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Altivec in baseline for ppc64?



On Tue, Jul 13, 2021 at 2:20 PM Sébastien Villemot <sebastien@debian.org> wrote:
>
> Le mardi 13 juillet 2021 à 20:06 +0200, Mathieu Malaterre a écrit :
> > On Tue, Jul 13, 2021 at 7:21 PM Sébastien Villemot <sebastien@debian.org> wrote:
> > > Le mardi 13 juillet 2021 à 18:56 +0200, Mathieu Malaterre a écrit :
> > > >
> > > > On Tue, Jul 13, 2021 at 2:04 PM Sébastien Villemot <sebastien@debian.org> wrote:
> > > > >
> > > > > The wiki page that synthesizes architecture specificities indicates
> > > > > that Altivec is included in the baseline for the ppc64 port:
> > > > > https://wiki.debian.org/ArchitectureSpecificsMemo#ppc64
> > > > >
> > > > > However my understanding is that this port supports any powerpc64 CPU,
> > > > > including some that don’t have Altivec (e.g. POWER4 or POWER5). This is
> > > > > also what the main wiki page for PPC64 says:
> > > > > https://wiki.debian.org/PPC64
> > > > >
> > > > > Can someone please clarify the situation?
> > > > >
> > > > > (I’m asking because I’m the maintainer of the openblas package, and
> > > > > knowing whether Altivec is available or not, and more generally what is
> > > > > in the baseline, is essential for proper packaging).
> > > >
> > > > I do not believe that you can do much as a packager. You cannot assume
> > > > anything on the target arch. You need to do the same thing as ffmpeg
> > > > is doing for avx2/sse4 on amd64, you need to do runtime detection. So
> > > > unless upstream is doing something very clever you cannot compile blas
> > > > using any of the fancy altivec instructions :(
> > > >
> > > > The man page for ld.so mentions something about optimized libraries
> > > > (search for "/usr/lib/sse2/"), but this is currently not in use in
> > > > Debian (AFAIK).
> > >
> > > Actually OpenBLAS has its own runtime detection mechanism, which is
> > > used to select the best linear algebra kernel for the current CPU
> > > (those kernels are mainly written in assembly, and take advantage of
> > > available ISA extensions). This mechanism is used on several archs,
> > > including ppc64el (so at runtime, OpenBLAS chooses between a POWER8 and
> > > a POWER9 kernel; there is even a POWER10 kernel already available).
> > >
> > > However, I cannot enable this mechanism on ppc64 and powerpc, because
> > > the runtime detection only works for POWER6 and above, and my
> > > understanding is that for these two ports the baseline is lower. Hence
> > > on these two archs, only one kernel is included in the package binaries
> > > (currently POWER4 for ppc64 and PPCG4 for powerpc). For optimal
> > > performance, users should recompile OpenBLAS locally (as indicated in
> > > the package description and in README.Debian).
> >
> > There are plenty of people on this mailing list that could test/verify
> > that. Is there a quick way to check that your openblas package is
> > compiled correctly for ppc32 and ppc64 (like a verbose mode) ? Did you
> > do any experiment on perotto.debian.net ?
>
> perotto.debian.net is POWER8, so it’s clearly well above the baseline.
> The package runs fine there, but that does not tell anything about
> baseline violation.
>
> Verifying that the package compiled fine and passed its testsuite on
> build daemons does not give any information about baseline violation
> either, because buildds are probably above the baseline as well. FYI,
> the most recent build logs are there:
> https://buildd.debian.org/status/package.php?p=openblas&suite=experimental
> (there is a problem with powerpc in experimental; but the version in
> sid compiled).
>
> If nobody has the relevant knowledge, then the only option is to test
> the package on the oldest possible hardware. The easiest way to test it
> is to recompile it locally (since this will exercise the testsuite).

I can provide SSH access to a PowerMac G5 with Altivec. That should
test the delineation between Altivec and PWR{5-10}.

If OpenBLAS needs to do 64-bit math, then I have the routines cribbed
away that performs 64-bit addition and subtraction using 32x4 vectors.
The routines have to handle carry/borrow themselves. My experience
with Crypto++ and algos like ChaCha20 demonstrate it is profitable.

Send over your SSH public key/authorized_keys, if interested.

Jeff


Reply to: