On Sun, 2005-09-18 at 13:18 +1000, Hamish Moffatt wrote:
On Sat, Sep 17, 2005 at 06:49:55PM -0700, lordSauron wrote:
pentium 4s use a 21 stage pipeline or something like that... so they
take approximately 21 clock cycles to get anything done. AMD uses
about 7 stages (or something in that neighbourhood) so if you divide
2.8 by 21 and 2.0 (my Athlon64) by 7, you get a really interesting
breakdown. You'll certainly find a HUGE increase in performance,
That's a terrible simplification. Yes, it takes longer to get the first
Not only is it a simplification, it's wrong.
result (21 cycles versus 7) but the idea of the pipeline is that you can
get a result every clock cycle after that.
But when you context-switch or branch, the pipeline gets dirty,
and the new process needs to fill up the pipeline.
Short pipelines like in Athlon & G4 are easier on branching,
but other techniques like speculative fetching and OOE mitigate
that somewhat.
And then, deep pipelines let you ramp up the clock much easier
than do short pipelines. Don't know why, though.