[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)



On Fri, Jul 16, 2010 at 9:23 AM, Wookey <wookey@wookware.org> wrote:
> +++ Konstantinos Margaritis [2010-07-16 16:04 +0300]:
>> On Friday 16 July 2010 15:36:26 Wookey wrote:
>> > Just to save others a bit of time:
>> >
>> > The summary from that lot is that hardfp is about 4% faster on average
>> > (between 0 and 11% on various tests). So definitely faster for
>> > real-world stuff, but 4% won't justify a new port from Debian's POV.
>>
>> 4% won't, but 30% might?
>
> Yes. Exactly.

In reality the speedup could be anything between 0 and maybe 25% with
some outliers which perform much, much better for some reason. In any
case, compiling the distro for armv7-a instead of armv4t introduces
further optimizations by the compiler on top of the FPU ABI change,
which improves the chances of it being faster (not tested here yet by
Konstantinos, I don't think).

>> Or what about the huge gain in povray?
>
> That was dramatic. I guess it leads on to the question of how many
> more really dramatic speedups there are in software people actually
> use (or would if it was faster :-)

See above. In theory anything that uses the FPU to do anything and
that FPU code is handled by being passed floating point arguments in
functions. For all we know a standard desktop is not improved: sitting
there, clicking a menu with a rectangular shape, showing all of 5
icons at a time. Imagine scrolling a folder with 5000 files with SVG
icons for each which need rendering and caching. That may improve a
little, but there is much more going on than the math behind the SVG.

As I summarized, the idea is that there are no *hidden potentially
expensive register transfers* which are inserted by the compiler just
to service an ABI which is pretty much pointless on the range of
processors we're suggesting optimization for.

sinf as a function is not a good test: it's one out of many. You layer
50, 100 calls to other FPU functions with more register usage, more
float or double argument passing (hey guys try double, hardfp will win
every time IMO) and the benefit is compounding.

-- Matt


Reply to: