[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Summary: BLAS/LAPACK Ecosys Massive Update



Hi fellow developers,

A good news especially for Debian scientific computing users.
I shall call it a massive update, even if the whole update
was decomposed into many tiny steps where some of them
had already been finished 1 year ago.

BLAS/LAPACK are two typical and classical dense linear algebra
libraries, directly or indirectly used by most scientific
computing software that involve vector & matrix operations,
or basic linear algebra problems such as solving linear systems,
solving linear least-square problems, or matrix factorization.
A glance at the popcon and their reverse depends would demonsrate
their importance: popcon > 0.1 million, and has notable rdeps
such as numpy, scipy, octave, arpack, (julia), etc.

  TIPS: the typical performance bottle neck in a deep neural
  network is matrix multiplication (already mentioned here[1]).
  And that can be done by BLAS.

For long time Debian's BLAS/LAPACK ecosystem lack a feature --
they should be compiled with different configurations due to
different user demands. For example, a pthread application
doesn't want to be linked against the library compiled in openmp
settings; a supercomputer user may want to deal with a super
large numerical array that cannot be indexed by 32-bit integer.

That's the two massive updates we are going to talk about:
(1) different threading flavours for BLAS/LAPACK implementations
(2) different indexing (32bit v.s. 64bit) ...

In 2018 I introduced intel-mkl (the fastest CPU-based BLAS/LAPACK
implementation on x86 architecture, yet non-free) to our archive
and registered it as an alternative of BLAS/LAPACK, and as the
first alternative of BLAS64/LAPACK64.

  FYI, intel-mkl's magical runtime dispatching library libmkl_rt.so
  supports both (32-bit, 64-bit) indexing, and  supports
  (gnu openmp, intel/llvm openmp, tbb, serial/sequential) threadding.

Then I introduced blis (second fastest CPU-based free impl). It's
a new package so it could be quite convenient for me to experiment
the imagined ecosystem update:

  FYI: blis: (32bit,64bit) x (openmp,pthread,serial)
  Note that blis only provides BLAS/CBLAS implementation, not
  including LAPACK.

Next I took part in GSoC2019/Gentoo, and introduced a BLAS/LAPACK
runtime switching mechanism for Gentoo, fixing a long standing
BLAS/LAPACK obstacle for Gentoo community:

  https://wiki.gentoo.org/wiki/Blas-lapack-switch

After finishing the GSoC, I started to patch src:lapack which
is the pivot package for the whole BLAS/LAPACK ecosys:

  FYI: src:lapack, (32bit,64bit)x(serial)
  It is the standard Fortran implementation. No multi-threading
  support.

And subsequently src:openblas (fastest, free impl)

  FYI: openblas (32bit,64bit)x(pthread,openmp,serial)
  Just cleared NEW queue (experimental) several hours ago.

So far all of my planned updates are basically finished (as long
as openblas is uploaded to sid). We still need some time to
test the features and make everything stablize, but our
BLAS/LAPACK ecosys has already entered a new era.

Notes for Debian Users
======================

I recommend the following BLAS/LAPACK combinations, sorted by
computation speed:

* BLAS=libmkl-rt, LAPACK=libmkl-rt (non-free)
* BLAS=openblas,  LAPACK=openblas
* BLAS=blis,      LAPACK=lapack
* BLAS=atlas,     LAPACK=? (not tested)
* BLAS=blas,      LAPACK=lapack (standard, slow)

I wrote a tool named "rover", a TUI frontend of update-alternatives,
which is already present in stable/unstable. Initially it was written
in order to conveniently debugging alternatives, however it is also
useful enough and users can switch alternatives with it conveniently.

Notes for Debian Developers
===========================

If you maintain packages depending on libblas.so|libblas-dev, or
liblapack.so|liblapack-dev:

  Q: should I link my package against the 64-bit variant instead?
  A: don't do that if you are not sure. However, please investigate
     this problem if your package is intended for supercomputing
     or non-trivial numerical computation.

...

I'll write a documentation about this whole thing and put it somewhere,
to avoid a lengthy mail ... please look forward to the future update..

Currently we have this brief wiki page:

  https://wiki.debian.org/DebianScience/LinearAlgebraLibraries

Acknowledgement
===============

Many thanks to Sébastien Villemot who had been maintaining
Debian's BLAS/LAPACK ecosys for many years. He reviewed nearly
every of my step mentioned in this main, and helped me find many
bugs and problems.

  FYI: Once upon a time when I was not a debian dev but merely a
  user, I had been deeply impressed by Sébastien's openblas
  packaging because it contains some carefully designed rules
  targets for the user to recompile the package locally, and
  that helped me at that time. That experience definitely
  further motivated my wish to become a DD. Thank you.

Many thanks to Debian Science team that also provided a number
of helpful feedbacks and suggestions.

Many thanks to Aron Xu for sponsoring hardware resource for
me to deal with many hard-to-compile stuff.
My mentor during the mentioned GSoC project is Benda Xu (Gentoo dev).

[1] https://people.debian.org/~lumin/debian-dl.html
    section 2


Reply to: