[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)



> > Enabling use of VFP does not require use of the hard-float ABI. Please
> > don't confuse the two.
> 
> The whole point of the port is that we get rid of the softfloat ABI in
> order to use the VFP unit without playing around moving
> registers around. This sort of came about from Konstantinos' porting
> of the Eigen2 library (after he had done it for AltiVec)
> to NEON and some of the developers noticed it wasn't so much faster
> because gcc inserts what can only be described as
> evil between the start of the function and the real meat of the code.
> The pipeline stalls for register movement are noticable
> in real code as a 20% or higher performance hit.

Yes, but the point I was responding to is that you don't necessarily need to 
use hard-float ABI to get most of the performance gain.

I completely agree that if you want to use the hard-float ABI then you need a 
new port.

However changing the ABI doesn't solve many of the underlying problem. 
Specifically how to provide optimized binaries that take advantage of new 
features on modern CPUs while still supporting older hardware.

Switching to the hard-float ABI certainly does give some benefit. While 20% 
isn't a trivial difference, it's important to keep this in context.  This is 
on top of what I'd guess is a 10x (i.e. 1000%) speedup achieved without 
breaking the ABI and requiring a whole new port.  If you're really serious 
about performance then a NEON optimized version of your critical code should 
get you annother 4x or so on a Cortex-A8.

> What would not be so great is that even if it was fixed, the option to
> use a faster floating point ABI drags in a clone of
> every package on your system (at the very least, libc, libm, and all
> the system library dependencies) increasing the
> size of the installed system.

What you're describing here is multiarch.

Paul


Reply to: