[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

priority score of libflame (LAPACK alternative)?



Hi science team,

As usual, I'd like to inform the team before registering a new lapack
implementation into our blas/lapack ecosystem.  The new implementation
is called "libflame", from the upstream of BLIS:
  https://github.com/flame/libflame
Similar to BLIS, it is a lapack-like object-based implementation, and
provides a compatibility layer to the traditional (fortran) lapack
called "lapack2flame".

I noticed this library because it's one of the AMD's reviving math library
stack (to some extent the MKL counterpart?): https://developer.amd.com/amd-aocl/
It is also noted that AMD upstreamed their patches to BLIS upstream.
That's a healthy phenomenon.

My preliminary tests of single precision SVD factorization demonsrate a
significant improvement over the netlib lapack and the openblas
lapack[1] implementation. Please find the results in the last part of
this mail.

Given these obvervations, I propose to

  * set the priority value of `libflame` (as a liblapack.so.3 provider)
    to 80,

because 1) I'm still not sure wether the libflame compat layer provides
the complete ABI; 2) We have not tested is sufficiently; 3) 80 is close
to the BLIS priority values (for libblas.so.3).

---

My test code can be found in the MKL packaging:
https://salsa.debian.org/science-team/intel-mkl/blob/master/debian/tests/test-gesvd.cc
Preliminary packaging can be found here:
https://salsa.debian.org/science-team/libflame
Switching alternatives has been made easy by my tiny util:
https://tracker.debian.org/pkg/rover

Results on Xeon Gold 6126 (sgesvd_, 512x512 matrix size):

  BLAS=openblas LAPACK=openblas -> ~560ms  # pthread
  BLAS=atlas    LAPACK=atlas    ->  N/A    # cgesvdq_ symbol not found
  
  BLAS=netlib   LAPACK=netlib   -> ~820ms
  BLAS=atlas    LAPACK=netlib   -> ~600ms
  BLAS=blis     LAPACK=netlib   -> ~560ms  # BLIS_NUM_THREADS=1
  
  BLAS=netlib   LAPACK=libflame -> ~700ms
  BLAS=atlas    LAPACK=libflame -> ~490ms
  BLAS=blis     LAPACK=libflame -> ~415ms  # BLIS_NUM_THREADS=1
  BLAS=openblas LAPACK=libflame -> ~415ms

I didn't compare it with MKL (non-free). That's unnecessary.

[1] openblas lapack $\approx$ netlib lapack, except for a few routines.


Reply to: