[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)



> In libm.so, I took sinf() -a very often used function, absolutely necessary
> for any trig stuff- and tried to actually find the differences using
> objdump:
>...
> Do the math, there are 6 more vmov instructions (all between rX and sX
> registers) in the softfp versions. Ok, if one gives a stall of 20 cycles,
> how many cycles do we lose in sinf() alone?

Depends how the function is called. Worst case we loose 17 cycles, best case 
we should be ~10 cycles faster.

Remember that mcr (i.e. vmov sX, rX) has zero latency, and the stall only 
occurs when the value is used.  Also remember that any comparison of a 
floating point value introduces the same latency, whether it be due to copying 
the value or the condition flags.

By my reading teh only problematic instruction is the move os the return value 
into r0.  However this won't actually stall until the value is used.  Even if 
the caller uses the value immediately the rest of the function epilogue should 
soak up a few cycles of that latency.

In general the first use of a function argument could cause a stall if the 
caller loaded that argument with mrc. However if the value came from memory or 
was already in core regs (e.g. because it was a function argument/return 
value).

In this case the first significant use of the value is a comparison. As 
mentioned above this may cause a stall in the soft-float case, but will always 
cause a stall in the hard-float case.

Likewise the stalls on the call to __kernel_sinf ballance each other out - 
softfp stalls on the return value, hard-float stalls on the argument 
comparison.

Paul


Reply to: