Bug#572746: libm: sinf/cosf performance is awful on amd64
On Sun, Mar 07, 2010 at 04:17:08PM +0100, Aurelien Jarno wrote:
> On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote:
> > Package: libc6
> > Version: 2.10.2-6
> > Severity: normal
> > Hi,
> > After many tests and research I've come to the conclusion that the float variants
> > of
> > sin/cos (and maybe others) are anormaly slow Debian amd64.
> > The performance loss is really impressive (around 8 to 9 times slower).
> > I've attached the prog used to make my experiments and used it in the following
> > cases.
> > + Lenny-amd64: sinf/cosf is really slow
> > + Lenny-i386: float performance is ok (faster than the cos/sin using double)
> > + Sid-amd64: sinf/cosf slow
> > + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK.
> On amd64, only sincos has an optimized version, sincosf is using the
> generic C implementation. On i386, there are optimized version of both
> sincos and sincosf
> > + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64,
> > the tests run fine !
> > => The problem is not compiler related.
> > There seems to be a problem with the way libm is compiled for the amd64
> > architecture on Debian.
> > This is why the OpenSuse test was run: the problem is somewhere in the compile
> > chain or debian specific patches.
> The problem is clearly not Debian specific, and is also present
> upstream. OpenSuse is probably using a patch to workaround the problem.
This is confirmed, there using an AMD version of the libm library on
x86_64, still coded in C for the sincosf function.
A quick an dirty implementation of sincosf in x86_64 assembly gives me a
speed around 4% slower than sincos. What kind of performance ratio do
you get on SuSe?
The solution seems to write each *f function in x86_64 assembly, but
that'll probably take time.
Aurelien Jarno GPG: 1024D/F1BCDB73