Re: cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)
+++ Matt Sealey [2010-07-15 16:39 -0500]:
> Hi Paul
> Please understand we know what we're talking about here :D
And so does Paul :-)
> In summary:
> * FPU is all emulated. FPU work is done in integer registers.
> * actual FPU used, FPU argument passing done in integer registers due
> to the soft/softfp EABI spec. your 10x speedup is here and comes from
> using the FPU instead of emulating it
> * You can use NEON here but you still are limited to passing float
> arguments in integer registers per the ABI
> * Each register transfer from integer to float register costs about 20 cycles
> * Boost in performance from using the FPU or NEON instead of emulation
> * Hidden performance penalty from the register transfers
> * Compatible with the above - soft and softfp code can be mixed
> * actual FPU is used in the same way
> * actual FPU code does not run faster
> * Boost in performance from using the FPU or NEON is the same
> * No hidden performance penalty
> * Completely incompatible ABI with the two above - no code mixing.
> That is what we're proposing.
Thanks for that clear and concise summary.
> This, coupled with the benefits of
> compiling for an improved ISA (ARMv7-A instead of ARMv4)
armel (Debian) is actually v4t. v4 was not supported (too hard, only
Strongarm thus disenfranchised)
> I am fairly sure (oh you did!) find a contrived benchmark to show that
> some code is faster on softfp in some cases, but taking a holistic
> approach I find it hard to believe that every time a floating point
> function is called across any of 20,000 packages possibly running on a
> system in a Debian port, that you will be able to benchmark a
> softfp+vfp system running faster than a hard+vfp one,
This remains a crucial question. If Paul is right then maybe it
doesn't actually make as much difference as you think. If we have both
of these builds then it shouldn't be too hard to measure their
relative performances. Ubuntu's existing armel flavour, (with softfp+vfp+thumb2
(v6.5) (I think)) is close to the necesary direct comparison with your
hardfp+vfp+ARM port. The arm/thumb2 thing clouds the waters somewhat,
and a genuinely equivalent comparison would be good. (I was under the
impression that thumb2 was usally faster in practice - you may want to
build for that in fact? Although I note the PB is not convinced on that point).
> Anyway I think everyone is agreed on that it should be done, just not the name..
Well, right at the beginning of this discussion a number of people
said 'I'd like to see the numbers'. That remains true, and the results
will help determine what flavours are woprth maintaining in the long
term. But of course in order to do that someone has to build the
necessary flavours/ports, so yes please, go for it. We await results
with bated breath/
We need some good benchmarks too.
is relevant to that I guess.
Principal hats: Linaro, Emdebian, Wookware, Balloonboard, ARM