[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fast blas1


Adam C Powell IV <hazelsct@mit.edu> writes:

> (Ancient thread, but here goes...)
> Adam C Powell IV wrote:
> > Camm Maguire wrote:
> >
> > > Greetings, and thanks for your work on the blas!
> >
> > Thanks, but it's really not my work, Kazushige Goto deserves the credit.  I just
> > spent a couple of hours plugging it in.
> >
> > > Have you by chance timed these routines against the atlas package blas?  Atlas
> > > automatically tunes the blas for your particular hardware, and is open source.
> >
> > No, but I'll give it a try; I suspect the Goto routines (dgemm in particular) will
> > be at least twice as fast because he does some really interesting unrolling
> > things, and it's written in very tight assembler.
> Okay, I was wrong.  I finally ran some tests with the FORTRAN BLAS, Atlas, and Goto's
> dgemm.  Goto's is faster, but not by nearly as much as I had thought.  My little
> program uses dgemm (matrix multiply), dgetrf (LU decompose) and dtrsm
> (back-substitution), and a weighted average gave the following results (in MFlop/s):
> Platform  FORTRAN  atlas  Goto's
> ev5          53     331    550
> ev6         191     681    830

Thanks for these!  Just a note that Clint Whaley of atlas fame has
opened up atlas development, and made possible user/assembler
contributions to be timed alongside the builtin routines and
incorporated if superior.  Goto has already contributed code, so we
may see atlas in the future as being even more comprehensive in its
performance tuning per platform.

Take care,

Camm Maguire			     			camm@enhanced.com
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Reply to: