[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why you are wrong [Was: On linux kernel packaging issue]



On Mon, 2003-11-10 at 16:27, Adam Heath wrote:
> On Mon, 10 Nov 2003, Joe Wreschnig wrote:
> 
> > A program that is CPU-bound *and* can be encoded more efficiently will
> > benefit from compiler optimizations. Some CPU bound things just aren't
> > going to be helped much by vectorization, instruction reordering, etc. I
> > mean, integer multiply is integer multiply.
> 
> But if the target cpu supports pipelining, and has multiple multiplication
> units(which means it can do them in parallel), or can do a 128bit multiple, or
> 1 64 bit multiple, at once, then it's more efficient to do a partial loop
> unroll, and thereby have faster code, because of more efficient parallization.
> 
> (sorry, read Dr. Dobbs last week).

I knew someone would chime in with this. :) AIUI this is only possible
when there is no data dependency issue (i.e. multiply no. n+1 does not
depend on no. n), otherwise you still have to serialize them.

This is also a good example where optimizing for one chip might slow
another one; say you've got 2 multiplication units on chip A, but only 1
on chip B. You unroll the loop partially when compiling. On A, this
helps, because you can do both multiplies at once. On B, this may slow
it down because of greater icache usage from the unrolled loop, or
because B could be doing (e.g.) an add and a multiply but not two
multiplies.

Of course, I'm far from a compiler and chip design expert (or even
novice); this is what I remember from my classes last year. :) But it
shows how complicated optimizing compilers can get, and why you can't
say any optimization is always good/safe/faster/etc. The only truly safe
way to tell is extensive, controlled benchmarking.
-- 
Joe Wreschnig <piman@debian.org>

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: