[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: fftw3 non-pic k7 optimisations

On Wed, Mar 02, 2005 at 05:16:50PM +0100, Florian Weimer wrote:
> * Paul Brossier:
> > Two questions:
> >  - can anyone spot what in these codelets causes the non-pic ?
> Tables of constants are addressed directly, not in some IP-relative
> way.

thanks, i thought this could be sort of a problem. so iiuc:

KP707106781KP707106781: .float +0.707106781186547524400844362104849039284835938, +0.707106781186547524400844362104849039284835938

is ok, but

        pfmul KP707106781KP707106781, %mm3

is not pic compliant, and should be replaced with something like

        pfmul KP707106781KP707106781(%ebx), %mm3

given i didn't know anything about assembly yesterday, how far am i?

> >  - how much can it hurt to have this non-pic in fftw3 ?
> It shouldn't matter much if all PIC code is grouped together in the
> binary because few pages have to be copied in this case.  PIC code
> itself is always slower, significantly so if the code is using all
> available integer registers.

I did some tests running 1024 points ffts on an AMD 700, and the
difference between with and without --enable-k7 was roughly a
drop of 1.5%. The drop could become more important on K7 with
smaller number of points, where k7/3dnow optimisations come in.

If there is no objections, and as suggested by upstream, i will
disable the k7 optimisations in order to make sure that
libfftw3f.so remains PIC compliant, despite a little slower.

cheers, piem

Attachment: signature.asc
Description: Digital signature

Reply to: