Re: lapack and blas
"James A. Treacy" <email@example.com> writes:
> I should start by mentioning that Sue is orphaning the lapack packages.
> One of these days I'll convince her to actually send mail to the wnpp
> stating such.
> Once she has officially orphaned them, I will send a note offering to
> take them over.
Wonderful! Do you think this will be done before November?
> Are you aware that the current lapack packages contain both static and shared
> libs(*)? The static libs are not compiled with -fPIC though. As I'm sure
> you are aware, using -fPIC uses a register which can have a substantial(**)
> hit on performance on register starved architectures like intel.
I'm sorry to have overlooked the lapack situation. Yes, I see that
now. I've been spending more time on scalapack, and just assumed
(incorrectly) that they were packaged the same.
As for the performance with -fPIC, yes, I was thinking the same thing
after I sent the original post. So, if we really want the highest
performing options, and the functionality of a user-invoked per-system
blas tuning feature provided by atlas, we have the following
1) Have the separate blas package ship static no -fPIC, static -fPIC,
and shared -fPIC versions. Then atlas can unpack the static -fPIC
with ar, copy in its own modules, and rebuild the static -fPIC and
shared -fPIC versions, preferably with separate filenames and using
the alternatives system to choose. Disadvantage: no static no
-fPIC optimized version can be made this way.
2) Have the atlas package duplicate the source code from the blas
package. Then we can forget about static -fPIC versions entirely.
Disadvantage: two packages with same source component. Maybe not a
problem? This seems better than 1, IMHO. Advantage: This package
is already ready (minus the alternatives)!
3) Maybe only have one blas/atlas package, supplying the reference
implementation, a pre-optimized generic implementation, and the
script to optimize to the user's system, with switching done with
alternatives. In other words, do we really ever want to ship just
the reference implementation by itself? Wouldn't the *alternative*
of an optimized version always be preferable? Advantage: I guess
this package is ready too.
> Perhaps a bit more discussion before a decision is reached. Do you have
> any information on performance of
> atlas modified static blas
> atlas modified static blas compiled using -fPIC ?
> How portable are the tuned libraries to other machines? For example, if the
> atlas modified blas libraries are generated on a P133, how well would those
> libs perform on a PII (with a much larger cache) compared to a set of
> libraries tuned specifically for that configuration.
I'm doing some benchmarks now, and hope to report shortly.
> If they aren't very portable and -fPIC isn't a big hit, then we could
> install atlas and use the postinst to automatically tune the users
> machine. I believe this is what you suggested in your opening paragraph.
Actually, I think you were right to try to avoid this. I've tried to
describe 1-3) in a way that the user will always have a no -fPIC
option. Question: How does this reserved register thing work? Its
not a reserved register *per* shared lib, is it? Just one for any
shared libraries at all? If so, what about libc and libm? Do people
running real code statically compile them in too?
> Can atlas fix an atlas modified version of blas? I'm thinking here of shipping
> the blas package with an atlas modified library which could then be tuned
> as per the previous paragraph if a user wishes. This would give an
> (possibly) improved generic blas library while still allowing power users
> to tune it for fastest performance on their machine.
This is a good idea. I think this is equivalent to 3) above. If we
agree on 3), then I'd be happy to send you the package I have, or, if
you'd prefer, I could maintain it and you maintain the reset of
> If we have good answers to the above, the best course of action
> should be clear.
> > Thanks again! I hope you don't mind me cc'ing this note to the
> > debian-beowulf mailing list, to solicit some additional feedback.
> No problem. Since the blas routines are critical to many programs we need
> to ship the fastest ones we can. More feedback is good.
> Jay Treacy
Thanks so much for your valuable input!
Camm Maguire firstname.lastname@example.org
"The earth is but one country, and mankind its citizens." -- Baha'u'llah