[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: confused about performance



Sebastian Kuzminsky wrote:
Hi folks, I just bought a pair of AMD64 systems for a work project,
and I'm confused about the performance I'm getting from them.  Both are
identically configured Dell Dimension C521 systems, with Athlon 64 X2
3800+ CPUs and 1 GB RAM.

On one I installed using the Etch (4.0r0) i386 netinst CD, then upgraded
to Lenny.  This one's running linux-image-2.6.21-1-686.

On the other I installed using the current (as of 2007-06-13) Lenny d-i
amd64 snapshot netinst CD.  This one's running linux-image-2.6.21-1-amd64.

The one with the x86 userspace and 686 kernel is faster than the one
with x86_64 userspace and amd64 kernel.  The difference is consistently
a few percent in favor of x86 over x86_64.

My only benchmark is compiling our internal source tree (mostly running
gcc, some g++, flex, bison, etc).  We're using gcc-4.1 and g++-4.1.
I've tried it with a cold disk cache and hot disk cache, in both cases
x86 is faster than x86_64.

I was expecting a win for 64 bit.  What's going on here?
64 bit both advantages and disadvantages, for each program it
all depends on how they balance out. Test many different
cpu-intensive programs - one benchmark alone won't tell you much:

Disadvantages:
* 64-bit code uses some more memory. More memory accesses
 take a little more time. In a borderline case, using more memory
 might cause more swapping, which is very noticeable.
* Quality differences in the compilers for 32-bit and 64-bit. This will
  likely improve a lot, given that we're seeing more and more 64-bit
  machines, and many of the 32-bit specific optimizations are already done.


Advantages:
* Faster floating point.
* 64-bit code lets a program use more than about 3GB trivially.
  Such software simply can't run 32-bit.
* 16 registers instead of 8. For some programs this won't matter for timing,
  for other cases it means a many-fold speedup as some important
  inner loop don't need to access memory at all, just those 16 registers.
  (Or smaller improvements when the loop access less memory thanks
   to more variables being held in registers.)
* Much faster computations on 64-bit datatypes, such as the
  "long long" type in C. Again, it depends on whether the sorce code
  specifies 64-bit types, (or the compiler manages to do this as an
  optimization.) I wrote a sudoku solver that mainly uses 64-bit
  and some 128-bit datatypes. It is not surprisingly several times
  faster 64-bit than 32-bit. :-)

Helge Hafting




Reply to: