Re: Fast blas1
Greetings, and thanks for your work on the blas! Have you by chance
timed these routines against the atlas package blas? Atlas
automatically tunes the blas for your particular hardware, and is open
source. And what's better, it has hooks allowing the user to provide
a small number of kernel routines to time against the others and
possibly include in the finished library. In fact, a few others and
myself are working on that right now for the PIII, using the kni and
prefetch x86 extensions. We're seeing significant gains with these
instructions, and hope to contribute them to atlas soon. The
interested reader can check out
http://master.debian.org/~camm/kniblas.tgz
http://master.debian.org/~camm/mblasnt.ps.gz
http://master.debian.org/~camm/mblasger.ps.gz
http://master.debian.org/~camm/mblast.ps.gz
Take care,
Adam C Powell IV <hazelsct@mit.edu> writes:
> Greetings,
>
> I made a diff for the Debian blas package to add Kazushige Goto's fast
> assembler BLAS (axpy, copy, dot, gemm, gemv, and a simple ger by yours
> truly which just loops over axpy). With this package, on my 600 MHz
> 21164 with 2M cache in dgemm, I get 796 MFlops for dgemm NN (matrix
> multiply) 1000x1000 by 1000x100, 395 MFlops for dgetrf (LU decompose)
> 1000x1000, so the LU decomposition takes 1.7 seconds! (And this on a
> $2K machine...) The old compiled FORTRAN gave about 50 MFlops, so this
> is over 15x faster!
>
> There are a couple of catches:
>
> * It's, um, inelegant. :-)
> * I can't figure out how to do cp */*.o *.i as in, rename those files
> to the same thing with .i instead of .o, so I can't make the static
> lib work.
> * Goto's BLAS are GPL! So it's illegal to link non-GPL apps!!
>
> There are two versions of the diff, for ev5 and ev6, the *only*
> difference between them is the ev5 version disables the CPU=EV6
> declaration in gemm/Makefile. (Maybe there should be a postinst script
> which installs the right one based on arch, but this is way beyond my
> abilities.) I also put up the blas1 deb for ev5, but without the static
> lib working blas-dev is irrelevant. It's all in
> http://lyre.mit.edu/~powell/debs/ .
>
> Share and enjoy- and please let me know if you find a way to make static
> work.
>
> Zeen,
> --
> Adam Powell http://lyre.mit.edu/~powell/
> Thomas B. King Assistant Professor of Materials Engineering
> 77 Massachusetts Ave. Rm. 4-117 Phone (617) 452-2086
> Cambridge, MA 02139 USA Fax (617) 253-5418
>
>
> --
> To UNSUBSCRIBE, email to debian-beowulf-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah
Reply to:
- References:
- Fast blas1
- From: Adam C Powell IV <hazelsct@mit.edu>