[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fast blas1



Greetings,

I made a diff for the Debian blas package to add Kazushige Goto's fast
assembler BLAS (axpy, copy, dot, gemm, gemv, and a simple ger by yours
truly which just loops over axpy).  With this package, on my 600 MHz
21164 with 2M cache in dgemm, I get 796 MFlops for dgemm NN (matrix
multiply) 1000x1000 by 1000x100, 395 MFlops for dgetrf (LU decompose)
1000x1000, so the LU decomposition takes 1.7 seconds!  (And this on a
$2K machine...)  The old compiled FORTRAN gave about 50 MFlops, so this
is over 15x faster!

There are a couple of catches:

   * It's, um, inelegant. :-)
   * I can't figure out how to do cp */*.o *.i as in, rename those files
     to the same thing with .i instead of .o, so I can't make the static
     lib work.
   * Goto's BLAS are GPL!  So it's illegal to link non-GPL apps!!

There are two versions of the diff, for ev5 and ev6, the *only*
difference between them is the ev5 version disables the CPU=EV6
declaration in gemm/Makefile.  (Maybe there should be a postinst script
which installs the right one based on arch, but this is way beyond my
abilities.)  I also put up the blas1 deb for ev5, but without the static
lib working blas-dev is irrelevant.  It's all in
http://lyre.mit.edu/~powell/debs/ .

Share and enjoy- and please let me know if you find a way to make static
work.

Zeen,
--
      Adam Powell                     http://lyre.mit.edu/~powell/
      Thomas B. King Assistant Professor of Materials Engineering
      77 Massachusetts Ave. Rm. 4-117         Phone (617) 452-2086
      Cambridge, MA 02139 USA                   Fax (617) 253-5418


Reply to: