[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian for x86-64 (AMD Opteron)



On Thu, Apr 10, 2003 at 04:50:59PM +0100, Hamish Marson wrote:
> Ah right. Light dawneth. Yes, you make excellent sense. Basically ia32 
> is so hacked about & wacky (In order to be backwardly compatible) as to 
> be very slow, yet ia64 is a new instruction set with none of the baggage 
> that it had to carry around. Thus you can optimise ia64 architecture 
> better than ia32.

Yep, that's pretty much it.  But ia64 != x86-64 ... ia64 is the Intel
Itanium's instruction set, while x86-64 is the AMD Opteron/Athlon 64,
and they're very very different beasts.  The x86-64 architecture really
is just another extension of the x86 architecture (and so retains all
the scary stuff that's been in there from the dawn of time for backwards
compatibility) ... but it does add in a few nice features that make life
easier for the compiler.  IA64 would take some time to explain in full,
but in short it's completely incompatible with x86 (of any sort), and is
based on an idea called VLIW where multiple instruction "bundles" are
issued together and more responsibility is placed on the compiler's
instruction scheduler to extract parallelism from the instruction
stream.  

> Compared to something like PowerPC (Sparc maybe? Although I don't think 
> Sparc was concieved as a 64 bit instruction set was it? I could be wrong 
> there though) where you start with a 64-bit definition and then cut it 
> back to 32-bit & so gain some optimisations which make 32-bit PowerPC 
> faster than 64-bit PowerPC (Except where you genuinely need 64bit of 
> course).

Really it's that when you're in 64-bit mode you use 64-bit operands for
many operations (particularly pointers and pointer arithmetic), and it
takes more time to do 64-bit math than 32-bit math (c.f. on Opteron a
32-bit multiply has a 3-cycle latency, but a 64-bit mul has a 5-cycle
latency).  Furthermore, in 64-bit mode you put more stress on the memory
subsystem because you're loading and storing some non-zero number of
64-bit data chunks that would have been smaller (probably 32 bits in
size) in 32-bit mode.

All that to say that if you can do something in 32-bit mode and all
other things are equal, 32-bit operations are more efficient than 64-bit
operations for many cases ... there's less bits to work on.  In the case
of x64-64, all things are *not* equal :), and the extra registers and
such tends to offset the extra overhead of dealing with 64-bit operands.
Or so AMD has said.

Also, in 64-bit mode AMD left sizeof(int)==4 so that the overhead of
64-bit integer operations isn't incurred for many code paths that don't
need it.  They also made some noise about changing the C calling
convention around and supposedly it's more efficient now, but I don't
know much in the way of details on that.

-- 
Anderson MacKay <mackay@ghs.com>
Green Hills Software -- Hardware Target Connections



Reply to: