[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64




------- Comment #9 from whaley at cs dot utsa dot edu  2006-12-19 16:04 -------
Ian,

Thanks for the info.  I see I failed to consider the cross-register moves you
mentioned.  However, can't those be moved through memory, where something
destined for a 64-bit register is first written from the 80-bit reg with
round-down?  Thus, you only do the round down when you have to change register
sets.  In a code compiled with -mfpmath=387, I would think that would occur
pretty much only at function epilogue for the return value . . .  Anyway, I see
how, depending on the framework, this may be more complicated than it seemed. 
However, my own compilation experience is that cross-precision/type conversions
are always complicated?

>All in all it's pretty hard for me to get excited about better support for
>80387 when all modern x87 chips support SSE2 which is more consistent and
>faster.  See the option -mfpmath=sse.

First, it is consistant only in that it always has 64-bit precision.  This is
like prefering a car that can only achieve 30 MPH to one that can go to 60, but
only for short stretches, and must sometimes slow down to 30.  The first is
more consistant, but hardly to be prefered :)

It is certainly the case that the x87 is of decreasing importance.  However,
scalar SSE (the default with gcc) does *not* in general on the present
generation run as fast as the x87 (I believe this common misconception comes
from conflating vector and scalar performance; on AMDs, even vector performance
is less than x87 for double precision).  

In particular, single precision scalar SSE seems to be much slower than x87
code, and double precision seems to be slightly slower *even when all 16 SSE
regs are used, in contrast to the crappy 8-reg x87 stack*.  Without proof, I
ascribe the closer double performance to the availability of movlpd, which
provides a low-cost scalar load not enjoyed by single precision (which must use
movss).  The only platform where scalar SSE *may* be competitive or better is
Core2Duo, and I haven't had a chance to do benchmarks there to see.  Note that
there is one performance advantage that x87 code will pretty much always have,
even once the archs improve their scalar SSE performance: it's much more
compact due to being defined earlier in the CISC instruction set, which can
massively reduce your instruction load on heavily unrolled loops, and allow
more instructions to fit in the selection window.

Now, if the performance were even (rather than x87 being faster), numerical
guys would still sometimes prefer the x87, in order to get that free extra
precision.  If 10,000 flops are done in 80-bit precision, your worst-case error
is roughly epsilon.  If they are done in 64-bit (SSE), your worst-case error is
10,000*epsilon.  Which would you prefer if you were in the space ship whose
flight path was being calculated? :)

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



Reply to: