[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

I like it! (Was: Summary: BLAS/LAPACK Ecosys Massive Update)



Hi Mo,

Thank you very much for generously sharing your
fast code!

I'm interested in BLAS and matrix multiplication
too.

libopenblas was reported to speed up SciKit-Learn 15X[1]!

Better yet, on the cool, privacy respecting POWER9
CPU!

Some stuff I wrote that you're welcome to us is

    1.) A generic shell script that reports how
        well a command of your choice scales to
        running on more and more cores.

        I think it's versatile and relevant.

        It's at

            http://loaner.com/how_fast_do_various_numbers_of_cores_run


    2.) A script intended to bench mark the
        effects of

            multiple cores,

            cache size and

            python interpreter

        is at

            http://loaner.com/multiplications_vs_speed_experiment.py

        Sample output is charted at

            http://loaner.com/multiplications_vs_speed_and_pythons_on_AMDs_Athlon.png

        It seems to me pypy3 was about 140X faster
        than python2 or 3!

If it would be convenient, comfortable, and all
those good things, feel free to share your
thoughts.

Thanks,
Kingsley
            
[1] Improving performance of Phoronix benchmarks on POWER9
    https://sthbrx.github.io/blog/2018/08/15/improving-performance-of-phoronix-benchmarks-on-power9/


On 10/28/2019 08:17, Mo Zhou wrote:
> Hi fellow developers,
> 
> A good news especially for Debian scientific computing users.
> I shall call it a massive update, even if the whole update
> was decomposed into many tiny steps where some of them
> had already been finished 1 year ago.
> 
> BLAS/LAPACK are two typical and classical dense linear algebra
> libraries, directly or indirectly used by most scientific
> computing software that involve vector & matrix operations,
> or basic linear algebra problems such as solving linear systems,
> solving linear least-square problems, or matrix factorization.
> A glance at the popcon and their reverse depends would demonsrate
> their importance: popcon > 0.1 million, and has notable rdeps
> such as numpy, scipy, octave, arpack, (julia), etc.
> 
>   TIPS: the typical performance bottle neck in a deep neural
>   network is matrix multiplication (already mentioned here[1]).
>   And that can be done by BLAS.
> 
> For long time Debian's BLAS/LAPACK ecosystem lack a feature --
> they should be compiled with different configurations due to
> different user demands. For example, a pthread application
> doesn't want to be linked against the library compiled in openmp
> settings; a supercomputer user may want to deal with a super
> large numerical array that cannot be indexed by 32-bit integer.
> 
> That's the two massive updates we are going to talk about:
> (1) different threading flavours for BLAS/LAPACK implementations
> (2) different indexing (32bit v.s. 64bit) ...
> 
> In 2018 I introduced intel-mkl (the fastest CPU-based BLAS/LAPACK
> implementation on x86 architecture, yet non-free) to our archive
> and registered it as an alternative of BLAS/LAPACK, and as the
> first alternative of BLAS64/LAPACK64.
> 
>   FYI, intel-mkl's magical runtime dispatching library libmkl_rt.so
>   supports both (32-bit, 64-bit) indexing, and  supports
>   (gnu openmp, intel/llvm openmp, tbb, serial/sequential) threadding.
> 
> Then I introduced blis (second fastest CPU-based free impl). It's
> a new package so it could be quite convenient for me to experiment
> the imagined ecosystem update:
> 
>   FYI: blis: (32bit,64bit) x (openmp,pthread,serial)
>   Note that blis only provides BLAS/CBLAS implementation, not
>   including LAPACK.
> 
> Next I took part in GSoC2019/Gentoo, and introduced a BLAS/LAPACK
> runtime switching mechanism for Gentoo, fixing a long standing
> BLAS/LAPACK obstacle for Gentoo community:
> 
>   https://wiki.gentoo.org/wiki/Blas-lapack-switch
> 
> After finishing the GSoC, I started to patch src:lapack which
> is the pivot package for the whole BLAS/LAPACK ecosys:
> 
>   FYI: src:lapack, (32bit,64bit)x(serial)
>   It is the standard Fortran implementation. No multi-threading
>   support.
> 
> And subsequently src:openblas (fastest, free impl)
> 
>   FYI: openblas (32bit,64bit)x(pthread,openmp,serial)
>   Just cleared NEW queue (experimental) several hours ago.
> 
> So far all of my planned updates are basically finished (as long
> as openblas is uploaded to sid). We still need some time to
> test the features and make everything stablize, but our
> BLAS/LAPACK ecosys has already entered a new era.
> 
> Notes for Debian Users
> ======================
> 
> I recommend the following BLAS/LAPACK combinations, sorted by
> computation speed:
> 
> * BLAS=libmkl-rt, LAPACK=libmkl-rt (non-free)
> * BLAS=openblas,  LAPACK=openblas
> * BLAS=blis,      LAPACK=lapack
> * BLAS=atlas,     LAPACK=? (not tested)
> * BLAS=blas,      LAPACK=lapack (standard, slow)
> 
> I wrote a tool named "rover", a TUI frontend of update-alternatives,
> which is already present in stable/unstable. Initially it was written
> in order to conveniently debugging alternatives, however it is also
> useful enough and users can switch alternatives with it conveniently.
> 
> Notes for Debian Developers
> ===========================
> 
> If you maintain packages depending on libblas.so|libblas-dev, or
> liblapack.so|liblapack-dev:
> 
>   Q: should I link my package against the 64-bit variant instead?
>   A: don't do that if you are not sure. However, please investigate
>      this problem if your package is intended for supercomputing
>      or non-trivial numerical computation.
> 
> ...
> 
> I'll write a documentation about this whole thing and put it somewhere,
> to avoid a lengthy mail ... please look forward to the future update..
> 
> Currently we have this brief wiki page:
> 
>   https://wiki.debian.org/DebianScience/LinearAlgebraLibraries
> 
> Acknowledgement
> ===============
> 
> Many thanks to Sébastien Villemot who had been maintaining
> Debian's BLAS/LAPACK ecosys for many years. He reviewed nearly
> every of my step mentioned in this main, and helped me find many
> bugs and problems.
> 
>   FYI: Once upon a time when I was not a debian dev but merely a
>   user, I had been deeply impressed by Sébastien's openblas
>   packaging because it contains some carefully designed rules
>   targets for the user to recompile the package locally, and
>   that helped me at that time. That experience definitely
>   further motivated my wish to become a DD. Thank you.
> 
> Many thanks to Debian Science team that also provided a number
> of helpful feedbacks and suggestions.
> 
> Many thanks to Aron Xu for sponsoring hardware resource for
> me to deal with many hard-to-compile stuff.
> My mentor during the mentioned GSoC project is Benda Xu (Gentoo dev).
> 
> [1] https://people.debian.org/~lumin/debian-dl.html
>     section 2
> 

-- 
Time is the fire in which we all burn.


Reply to: