[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: i586 debian port?



These numbers arn't worth much unless you also rebuilt libc6 to be pentium
optimized. There was a long thread on this this summer. I actually built all
of the base system and a few other programs pentium optimized and saw a
speeedup of at least 5% (20% is claimed).

Drake Diedrich wrote:
> On Sat, Jan 09, 1999 at 02:09:30PM +0000, Russell Coker wrote:
> > 
> > There's probably not much need for a Pentium optimised version of "cp" or any
> > other similar programs which spend most of their time waiting for IO and aren't
> > run that much.  For a Pentium optimised system you would only need different
> > versions of programs that matter, libc6, gzip, bzip2, povray (which are all
> > reasonably small), and maybe X servers (a significant amount of data).
> > Someone with a good knowledge of pgcc could probably produce a Pentium
> > optimised version of all these packages in an afternoon.
> > 
> 
>    I did some tests recently comparing pgcc with egcc for some floating
> point intensive code (simple 100x100 matrix multiplies).  On Pentium,
> Pentium II, and Pentium Pro egcc code was between 3% faster and 1% slower-
> within the typical deviation between runs.  Sometimes gcc (2.7.2) was up
> there as well, though it was typically about 10% slower.  Curiously, Pentium
> MMX did perform ~10% better under pgcc as compared to egcc, but only on the
> static matrix code.  The dynamicly allocated matrix code was no faster.
>    An AMD K6-2/300 performed comparably to the pentium 166 MMX (both SDRAM).
> Cyrix was not tested this time, but on previous tests it was comparable to a
> pentium at about half the clock rate.  For floating point performance stick
> to Intel architectures, especially pII and PPro.  pgcc with k6 flags did not
> perform as well as *any* of the other compilers (pgcc mppro, -mpentium,
> egcc, or gcc) on the AMD.  The latest egcc with -O6 has the pgcc code
> scheduler built in.  Architecture specific scheduling and op codes did not
> improve run times significantly on any architecture aside from the Pentium
> MMX.
>    The example programs and libraries listed above are actually
> predominantly integer codes (even povray- it spends a lot of time wandering
> around linked lists compared to floating point ops).  It's unlikely they'd
> improve even as much as the floating point code under a pgcc compile- though
> this should be tested.  I think I did benchmark gzip once, but have
> misplaced the results.  IIRC gzip was insensitive to the compiler and
> optimization flags- the C code has already been hand tuned.
>    Strangely, the best time was on a Pentium II with pgcc -mk6
> optimizations.  It was 1% better than egcc.  Debian's BLAS routines are half
> as fast as naively coded matrix multiplies (I wanted to test the compilers,
> not the CPUs).  ASCI-Red's hand-coded assembly BLAS is almost twice as fast
> as the code produced by our compilers, and 4 times as fast as our BLAS
> library.  A slightly less naive matrix multiply (hand unrolling to improve
> cache hits) improved run times 25% on Intel.  On an SGI this less naive code
> was several times faster than the naive code (RISC - lots of registers).
> The non-Intel architectures might show similar speedups.
> 
> -- 
> Dr. Drake Diedrich, Research Officer - Computing, (02)6279-8302
> John Curtin School of Medical Research, Australian National University 0200
> Replies to other than Drake.Diedrich@anu.edu.au will be routed off-planet
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

-- 
see shy jo


Reply to: