Re: Fast blas1
Adam C Powell IV <firstname.lastname@example.org> writes:
> (Ancient thread, but here goes...)
> Adam C Powell IV wrote:
> > Camm Maguire wrote:
> > > Greetings, and thanks for your work on the blas!
> > Thanks, but it's really not my work, Kazushige Goto deserves the credit. I just
> > spent a couple of hours plugging it in.
> > > Have you by chance timed these routines against the atlas package blas? Atlas
> > > automatically tunes the blas for your particular hardware, and is open source.
> > No, but I'll give it a try; I suspect the Goto routines (dgemm in particular) will
> > be at least twice as fast because he does some really interesting unrolling
> > things, and it's written in very tight assembler.
> Okay, I was wrong. I finally ran some tests with the FORTRAN BLAS, Atlas, and Goto's
> dgemm. Goto's is faster, but not by nearly as much as I had thought. My little
> program uses dgemm (matrix multiply), dgetrf (LU decompose) and dtrsm
> (back-substitution), and a weighted average gave the following results (in MFlop/s):
> Platform FORTRAN atlas Goto's
> ev5 53 331 550
> ev6 191 681 830
Thanks for these! Just a note that Clint Whaley of atlas fame has
opened up atlas development, and made possible user/assembler
contributions to be timed alongside the builtin routines and
incorporated if superior. Goto has already contributed code, so we
may see atlas in the future as being even more comprehensive in its
performance tuning per platform.
Camm Maguire email@example.com
"The earth is but one country, and mankind its citizens." -- Baha'u'llah