[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Note Pentium Optimized only really helps P5 CPUs ...



>things architecturally would be a great benefit to Debian.  But before
>everyone does the Stampede thing and compiles everything
>Pentium-optimized, be aware that Pentium optimizations only make a big,
>noticeable difference for the original and MMX Intel Pentium (not PPro,
>not P2, not P3, not K6, etc.) processors.  The big speedup is because the
>compiler/assembler reschedules integer instructions so that they can
>execute in parallel inside the Pentium's two integer pipelines (which
>have rather primitive inter-pipeline conflict/dependency resolution
>hardware).  Most CPUs since then (Cyrix 6x86MX/M2, AMD K6-series, and
>Intel P6-series including PPro, P2, P3, and Celeron) have better conflict
>and dependency analysis/resolution hardware (or can execute
>out-of-order), and don't gain much from Pentium-specific optimizations.
>In fact, most other CPUs run better with plain old i386 or i486 code than
>they do with Pentium-optimized code.
>
>That said, Pentium optimizations *should* really help out owners of the
>genuine Intel Pentium (and MMX) processors.  By virtue of the additional
>flexibility for old machines, I'd be impressed to see Debian do this --
>it would probably require resolving some of the same problems that hold
>back a true segmentation of the distribution.  So, from an infrastructure
>and making-the-distribution-amazing (as if it isn't already :)
>standpoint, this is a good thing -- from a speeding-up-your CPU
>perspective though, it's only a great thing if you have a true
>made-by-Intel Pentium or Pentium MMX.

this is only one part. Note that there *have* been changes in the speeds of
the old instructions and therefore some instruction combinations are faster
than others while doing *exactly* the same thing with penalty of only larger
code or (sometimes) needing an extra register. Note that this is more
generation-specific than architecture-specific -- meaning that an
instruction chain which is faster on a Intel Pentium is faster in a K5 as
well.

New compilers take special options to select the architecture and processor
type for which you are compiling. You can compile a binary using say
-mcpu=pentium switch to take advantage of the instruction timings (not new
instructions), which creates an executable capable of running on say a 386,
yet this executable will contain code chains which complete in less clocks
on a pentium. In contrary to this, -march=pentium will *most likely* create
a binary, which will run only on pentium or later because of incorporation
of new instructions.

the other thing is the fact, that pentium-class processors benefit from
larger alignment. This is especially true with the double-precission
arithmetics, where data alignment to 8-byte boundary can give you up to 100%
performance boost (compared to 4-byte aligned data). Note that this would be
a waste of space on a 386 and could even decrease performance.

also, when compiling for a specific architecture(386/486/pentium/pentiumpro) 
the compiler can guest which kinds of instructions are likely to cause a
register stall and avoid them --  using an 8 bit operand doesn't matter that
much on a 386 as it does on PIII.

As for the separate package, IMHO the gains of having a separate Pentium
distribution are not enough to account for the waste of space. A good way to
use the speed would be to ship separate packages of apps, where the
optimization *really* counts -- such as gzip/bzip2/tar and some others.

--
Robert Varga <varga@hq.alert.sk>


Reply to: