Re: cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)
> In libm.so, I took sinf() -a very often used function, absolutely necessary
> for any trig stuff- and tried to actually find the differences using
> Do the math, there are 6 more vmov instructions (all between rX and sX
> registers) in the softfp versions. Ok, if one gives a stall of 20 cycles,
> how many cycles do we lose in sinf() alone?
Depends how the function is called. Worst case we loose 17 cycles, best case
we should be ~10 cycles faster.
Remember that mcr (i.e. vmov sX, rX) has zero latency, and the stall only
occurs when the value is used. Also remember that any comparison of a
floating point value introduces the same latency, whether it be due to copying
the value or the condition flags.
By my reading teh only problematic instruction is the move os the return value
into r0. However this won't actually stall until the value is used. Even if
the caller uses the value immediately the rest of the function epilogue should
soak up a few cycles of that latency.
In general the first use of a function argument could cause a stall if the
caller loaded that argument with mrc. However if the value came from memory or
was already in core regs (e.g. because it was a function argument/return
In this case the first significant use of the value is a comparison. As
mentioned above this may cause a stall in the soft-float case, but will always
cause a stall in the hard-float case.
Likewise the stalls on the call to __kernel_sinf ballance each other out -
softfp stalls on the return value, hard-float stalls on the argument