Re: BLAS recommendations
Greetings!
Emil Briggs <briggs@tick.physics.ncsu.edu> writes:
> >
> > a) What is the "best" way to add BLAS to a 'wulfish cluster of PPro's
> >and PII's? I say best instead of fastest or cheapest or GPL'd-est to
> >allow for a variety of personal interpretations of the word best. My
> >own would be very much GPL, SRPM based (and fastest possible within that
> >constraint) if such a thing were possible, but I'd love to hear about
> >fastest under any circumstances, best for sale, and so forth as well.
> > b) How is BLAS documented? Is it of the "if you have to ask, you
> >shouldn't be using it" variety? Books? A manual somewhere?
> > c) Is there e.g. a website for software written to make use of BLAS?
> > d) Are any of the above really ignorant questions? If so please
> >Enlighten me...
> >
>
> For recommendations I would say that ATLAS is a great choice for the
> Level3 stuff. If you've got an Athlon then my libs are OK
> for the Level1. No recommendations for Level2 (Our codes don't
> require much Level2 stuff so it hasn't been a high priority for me).
>
>
What he said! Debian has an atlas package which supplies a binary
compatible optimized blas library accessible at runtime via
LD_LIBRARY_PATH. Just to put some numbers behind what Emil said, here
are results on our cluster of 16 PII 350s:
xd3blastst: (compares atlas and reference blas on a single processor)
DGEMM
TEST TA TB M N K alpha beta Time Mflop SpUp PASS
==== == == === === === ===== ===== ====== ===== ==== ====
2 N N 200 200 200 1.0 0.0 0.33 48.5 1.00 ---
2 N N 200 200 200 1.0 0.0 0.07 228.6 4.71 YES
3 N N 300 300 300 1.0 0.0 1.64 32.9 1.00 ---
3 N N 300 300 300 1.0 0.0 0.20 270.0 8.20 YES
4 N N 400 400 400 1.0 0.0 4.22 30.3 1.00 ---
4 N N 400 400 400 1.0 0.0 0.50 256.0 8.44 YES
5 N N 500 500 500 1.0 0.0 8.22 30.4 1.00 ---
5 N N 500 500 500 1.0 0.0 0.95 263.2 8.65 YES
mpi xdlutime
Simple Timer for ScaLAPACK routine PDGESV
Number of processors used: 16
TIME N NB P Q LU Time Sol Time MFLOP/S Residual CHECK
---- ----- --- --- --- --------- --------- -------- -------- -------
WALL 100 64 4 4 0.36 0.02 1.79 0.010005 PASSED
WALL 4096 64 4 4 104.68 0.66 435.14 0.003757 PASSED
LD_LIBRARY_PATH=/usr/lib/atlas mpi -x LD_LIBRARY_PATH xdlutime
Simple Timer for ScaLAPACK routine PDGESV
Number of processors used: 16
TIME N NB P Q LU Time Sol Time MFLOP/S Residual CHECK
---- ----- --- --- --- --------- --------- -------- -------- -------
WALL 100 64 4 4 0.15 0.08 2.99 0.004056 PASSED
WALL 4096 64 4 4 32.83 0.65 1369.22 0.000857 PASSED
mpi /usr/lib/scalapack/xdpblas3tim-lam
ScaLAPACK Level-3 PBLAS timing program.
'Intel iPSC/860 hypercube, gamma model.'
Tests of the real double precision Level-3 PBLAS
Number of Tests : 1
Number of process grids : 1
P : 4
Q : 4
Alpha : 2.00000
Beta : 3.00000
Routines to be tested : PDGEMM ... Yes
PDSYMM ... Yes
PDSYRK ... Yes
PDSYR2K ... Yes
PDTRAN ... Yes
PDTRMM ... Yes
PDTRSM ... Yes
Tests started.
Test number 1 started on a 4 x 4 process grid.
-------------------------------------------------------------------
M N K SIDE UPLO TRANSA TRANSB DIAG
-------------------------------------------------------------------
1024 1024 1024 L U N N N
-------------------------------------------------------------------
IA JA MA NA MBA NBA RSRCA CSRCA
-------------------------------------------------------------------
1 1 1024 1024 64 64 0 0
-------------------------------------------------------------------
IB JB MB NB MBB NBB RSRCB CSRCB
-------------------------------------------------------------------
1 1 1024 1024 64 64 0 0
-------------------------------------------------------------------
IC JC MC NC MBC NBC RSRCC CSRCC
-------------------------------------------------------------------
1 1 1024 1024 64 64 0 0
-------------------------------------------------------------------
WALL time (s) WALL Mflops CPU time (s) CPU Mflops
PDGEMM 3.583 599.366 -1.000 0.000
PDSYMM 5.254 408.731 -1.000 0.000
PDSYRK 2.896 371.170 -1.000 0.000
PDSYR2K 5.760 372.799 -1.000 0.000
PDTRAN 0.128 0.000 -1.000 0.000
PDTRMM 2.862 375.217 -1.000 0.000
PDTRSM 5.727 0.000 -1.000 0.000
-------------------------------------------------------------------
Test number 1 completed.
End of Tests.
LD_LIBRARY_PATH=/usr/lib/atlas mpi -x LD_LIBRARY_PATH /usr/lib/scalapack/xdpblas3tim-lam
ScaLAPACK Level-3 PBLAS timing program.
'Intel iPSC/860 hypercube, gamma model.'
Tests of the real double precision Level-3 PBLAS
Number of Tests : 1
Number of process grids : 1
P : 4
Q : 4
Alpha : 2.00000
Beta : 3.00000
Routines to be tested : PDGEMM ... Yes
PDSYMM ... Yes
PDSYRK ... Yes
PDSYR2K ... Yes
PDTRAN ... Yes
PDTRMM ... Yes
PDTRSM ... Yes
Tests started.
Test number 1 started on a 4 x 4 process grid.
-------------------------------------------------------------------
M N K SIDE UPLO TRANSA TRANSB DIAG
-------------------------------------------------------------------
1024 1024 1024 L U N N N
-------------------------------------------------------------------
IA JA MA NA MBA NBA RSRCA CSRCA
-------------------------------------------------------------------
1 1 1024 1024 64 64 0 0
-------------------------------------------------------------------
IB JB MB NB MBB NBB RSRCB CSRCB
-------------------------------------------------------------------
1 1 1024 1024 64 64 0 0
-------------------------------------------------------------------
IC JC MC NC MBC NBC RSRCC CSRCC
-------------------------------------------------------------------
1 1 1024 1024 64 64 0 0
-------------------------------------------------------------------
WALL time (s) WALL Mflops CPU time (s) CPU Mflops
PDGEMM 1.396 1537.853 -1.000 0.000
PDSYMM 2.904 739.497 -1.000 0.000
PDSYRK 1.271 845.707 -1.000 0.000
PDSYR2K 2.482 865.127 -1.000 0.000
PDTRAN 0.128 0.000 -1.000 0.000
PDTRMM 1.476 727.324 -1.000 0.000
PDTRSM 3.456 0.000 -1.000 0.000
-------------------------------------------------------------------
Test number 1 completed.
End of Tests.
When using a 4096x4096 matrix, the atlas results aproach 3 gigaflops
on the pdgemm.
Take care,
> Regards
> Emil
>
> -------------------------------------------------------------------
> To unsubscribe send a message body containing "unsubscribe"
> to beowulf-request@beowulf.org
--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah
Reply to: