Re: developer release 3.1.2

To: R Clint Whaley <rwhaley@cs.utk.edu>
Cc: atlas@cs.utk.edu, atlas-comm@cs.utk.edu, debian-beowulf@lists.debian.org
Subject: Re: developer release 3.1.2
From: Camm Maguire <camm@enhanced.com>
Date: 26 Jul 2000 11:51:22 -0400
Message-id: <[🔎] 54aef4dh51.fsf@intech9.enhanced.com>
In-reply-to: R Clint Whaley's message of "Sat, 22 Jul 2000 14:48:05 -0400 (EDT)"
References: <200007221848.OAA00238@nala.cs.utk.edu>

Greetings!

1) It looks as though the prefetch assisted double precision level2
   will max out at about 50% + standard atlas.  Transpose: 94 ->140,
   Notrans: 67 -> 97.  dger remains to be completed.  Basically, I
   just looked at the atlas compiled assembler, and added prefetch.

   So the rule of thumb appears to be SIMD +50%, prefetch +50%, 
   both +100%.

2) If anyone has a handy reference for the Athlon SIMD instructions, I
   think these routines will port over to that platform with minimal
   change. We even have an Athlon that I can try out :-)

3) I do hope we can find a solution for distributed atlas binaries.  I
   know the idea is for the user to build atlas on each platform they
   will use, and that the current tree will skip any routines which
   fail to compile on a given platform, (i.e. if there is no SIMD
   support).  Serious users will do this no doubt.  

   But I do maintain an atlas package for Debian, and I've found that,
   while the distributed library is obviously not completely optimal,
   it is very frequently significantly better than the reference blas,
   and gives new users a quick way to try atlas out to see if its
   worth their while. (The Debian package provides an atlas drop-in
   shared library replacement for the standard blas, so one can
   compare performance gains at runtime simply by setting the
   LD_LIBRARY_PATH environment variable.)

   One solution is to compile several versions covering the most
   common platforms.  Perhaps this is best.  However, this strategy
   has the potential for confusion, both at compile time, and for
   users inadvertently installing the wrong binary, getting a crash,
   and filing a bug.  Another option is to have a flag somewhere in
   the build process specifying "compiled code only" or some such.
   Then we could leave the SIMD gains to the serious users and maybe
   provide a note to this effect in the docs accompanying the package.
   I really don't know what to do, I just thought I'd mention it.

4) Do we have an idea as to when we might want to release a
   SIMD-enhanced atlas, say in Debian?  Is there any word on the most
   important level3 front?

Take care,   

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,
> 
> I just returned from vacation; thanks a lot for the complex stuff. 
> Looks like GEMV is faster than GEMM for complex; I will look into this,
> see if we want a GEMV-based GEMM for this platform, until the emmerald stuff
> comes through :)  My guess is no (short vector lengths), but worth a shot,
> I guess . . .
> 
> Anyway, I haven't scoped it yet, but I'll get it in to a developer
> release ASAP (I'm working right now mostly on getting Antoine's
> pthreads stuff in) . . .
> 
> As to your question, I do not believe inlining is a problem when
> the entire lib is compiled with same flags (I have never experienced
> the problems with inlining you spoke about); the only tricky thing
> I know of is that you need to make damn sure to prototype routines
> when they contain single precision scalars; not doing so can cause
> seg or bus faults . . .
> 
> Thanks,
> Clint
> 
> 

-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Reply to:

Prev by Date: Re: Packaged MPICH 1.2.0 for Potato
Next by Date: MPICH 1.2 performance
Previous by thread: Re: Cluster filesystems
Next by thread: MPICH 1.2 performance
Index(es):
- Date
- Thread