[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Compiler opt



On Sat, Aug 28, 2004 at 02:32:16PM -0700, Karl Hegbloom wrote:
> [ I am subscribed to the list; there is no reason to Cc. ]
> 
> On Sat, 2004-08-28 at 20:25 +0100, Paul Brook wrote:
> > On Saturday 28 August 2004 19:55, Karl Hegbloom wrote:
> > > On Sat, 2004-08-28 at 12:11 +0200, Sebastian Steinlechner wrote:
> > > > On Sat, 2004-08-28 at 02:22, Karl Hegbloom wrote:
> > > > > How do I determine the default compiler optimization settings?  I'm
> > > > > wonder if it will be worth while to try and recompile things like
> > > > > python2.3 and python2.3-numeric with '-m64'?  Or is '-m64' already the
> > > > > default?
> > > >
> > > > If you're compiling on pure64, then you'll get 64-bit ELFs/libs (so -m64
> > > > is the default).
> > >
> > > Ok.  So then -m64 implies -march=k8 ?
> > 
> > No, but also we want binaries to work on Intel em64t cpus (right?). Using 
> > -march=k8 would prevent this (by potential use of 3dnow instructions).
> 
> Ok, so, at least potentially, recompiling things like libc6, refblas3,
> atlas3-base, and linpack3 with '-march=k8 -O3' could provide faster run
> times for maths code?

 Yes.  For some programs (as opposed to libraries) you could compile with
-ffast-math.  Not a good idea for libraries, because you don't want non-IEEE
math in libraries that could be used by any program...

 For best results, compile once with -fprofile-generate, and then again with
-fprofile-use. gcc will add code to keep track of which branches are usually
taken, and which aren't, etc.  Then you run the program with some
representative input data.  Then gcc will use the profile data to make
better code when you compile again.  To do this for compiling a Debian
package, I found I had to remove stamp-configure and rebuild the whole
package;  It wasn't very convenient :(

> Is it correct that both em64t and amd64 have sse2? 

 Yes.  Pentium4 has always had SSE2.  Opteron and Athlon64 have always had
SSE2 (unlike 32bit Athlon XP).  SSE2 is vector/scalar double-precision
floating point using the xmm registers.  (as opposed to the x87 FP stack).
SSE1 is single precision, and was introduced with the Pentium3.  BTW, 3DNow
was AMD's inferior (non-IEEE compliant) answer to SSE1.  Short answer: SSE
math is better.  The FP stack sucks.

> What exactly does -m64 buy me?
> 
> Without -march=k8, does it only use the legacy i386 register set and
> instructions, or does -m64 cause it to use them?

 -m64 makes it use them.  -march=k8 can be used with -m32 or with the
(default) -m64 to get gcc to tune the code for an Opteron/Athlon64, with its
pipeline, rather than P4's longer pipeline.  It also turns on support for
all the instruction sets the CPU supports (i.e. everything P4 supports
except sse3 (and I don't know what that is), plus 3dnow/extended 3dnow.)

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC



Reply to: