[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: i586 debian port?



On Sat, Jan 09, 1999 at 02:09:30PM +0000, Russell Coker wrote:
> 
> There's probably not much need for a Pentium optimised version of "cp" or any
> other similar programs which spend most of their time waiting for IO and aren't
> run that much.  For a Pentium optimised system you would only need different
> versions of programs that matter, libc6, gzip, bzip2, povray (which are all
> reasonably small), and maybe X servers (a significant amount of data).
> Someone with a good knowledge of pgcc could probably produce a Pentium
> optimised version of all these packages in an afternoon.
> 

   I did some tests recently comparing pgcc with egcc for some floating
point intensive code (simple 100x100 matrix multiplies).  On Pentium,
Pentium II, and Pentium Pro egcc code was between 3% faster and 1% slower-
within the typical deviation between runs.  Sometimes gcc (2.7.2) was up
there as well, though it was typically about 10% slower.  Curiously, Pentium
MMX did perform ~10% better under pgcc as compared to egcc, but only on the
static matrix code.  The dynamicly allocated matrix code was no faster.
   An AMD K6-2/300 performed comparably to the pentium 166 MMX (both SDRAM).
Cyrix was not tested this time, but on previous tests it was comparable to a
pentium at about half the clock rate.  For floating point performance stick
to Intel architectures, especially pII and PPro.  pgcc with k6 flags did not
perform as well as *any* of the other compilers (pgcc mppro, -mpentium,
egcc, or gcc) on the AMD.  The latest egcc with -O6 has the pgcc code
scheduler built in.  Architecture specific scheduling and op codes did not
improve run times significantly on any architecture aside from the Pentium
MMX.
   The example programs and libraries listed above are actually
predominantly integer codes (even povray- it spends a lot of time wandering
around linked lists compared to floating point ops).  It's unlikely they'd
improve even as much as the floating point code under a pgcc compile- though
this should be tested.  I think I did benchmark gzip once, but have
misplaced the results.  IIRC gzip was insensitive to the compiler and
optimization flags- the C code has already been hand tuned.
   Strangely, the best time was on a Pentium II with pgcc -mk6
optimizations.  It was 1% better than egcc.  Debian's BLAS routines are half
as fast as naively coded matrix multiplies (I wanted to test the compilers,
not the CPUs).  ASCI-Red's hand-coded assembly BLAS is almost twice as fast
as the code produced by our compilers, and 4 times as fast as our BLAS
library.  A slightly less naive matrix multiply (hand unrolling to improve
cache hits) improved run times 25% on Intel.  On an SGI this less naive code
was several times faster than the naive code (RISC - lots of registers).
The non-Intel architectures might show similar speedups.

-- 
Dr. Drake Diedrich, Research Officer - Computing, (02)6279-8302
John Curtin School of Medical Research, Australian National University 0200
Replies to other than Drake.Diedrich@anu.edu.au will be routed off-planet


Reply to: