Fast blas1
Greetings,
I made a diff for the Debian blas package to add Kazushige Goto's fast
assembler BLAS (axpy, copy, dot, gemm, gemv, and a simple ger by yours
truly which just loops over axpy). With this package, on my 600 MHz
21164 with 2M cache in dgemm, I get 796 MFlops for dgemm NN (matrix
multiply) 1000x1000 by 1000x100, 395 MFlops for dgetrf (LU decompose)
1000x1000, so the LU decomposition takes 1.7 seconds! (And this on a
$2K machine...) The old compiled FORTRAN gave about 50 MFlops, so this
is over 15x faster!
There are a couple of catches:
* It's, um, inelegant. :-)
* I can't figure out how to do cp */*.o *.i as in, rename those files
to the same thing with .i instead of .o, so I can't make the static
lib work.
* Goto's BLAS are GPL! So it's illegal to link non-GPL apps!!
There are two versions of the diff, for ev5 and ev6, the *only*
difference between them is the ev5 version disables the CPU=EV6
declaration in gemm/Makefile. (Maybe there should be a postinst script
which installs the right one based on arch, but this is way beyond my
abilities.) I also put up the blas1 deb for ev5, but without the static
lib working blas-dev is irrelevant. It's all in
http://lyre.mit.edu/~powell/debs/ .
Share and enjoy- and please let me know if you find a way to make static
work.
Zeen,
--
Adam Powell http://lyre.mit.edu/~powell/
Thomas B. King Assistant Professor of Materials Engineering
77 Massachusetts Ave. Rm. 4-117 Phone (617) 452-2086
Cambridge, MA 02139 USA Fax (617) 253-5418
Reply to: