[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: fftw: Usage of SSE in 64bit?

(Reposting, as this message does not seem to have gone through.)

I am one of the FFTW developers, and wanted to comment on this.

Yes, you should definitely use --enable-sse/--enable-sse2 flags in when compiling single/double precision versions of FFTW on all x86 and x86-64 platforms. This is *not* just a matter of compiler flags -- it enables the compilation of special computational kernels in FFTW that explicitly use SSE/SSE2 intrinsics.

In addition to x86-64, note that this is SAFE to enable in general for all 32-bit x86 platforms. FFTW checks at runtime to see whether the processor supports SSE/SSE2 and disables its SSE/SSE2 code if not. (Similarly for Altivec on PowerPC, and similarly in the next release for AVX instructions.)

In general, I would recommend that the packager read the FFTW installation manual closely, since it documents these options.

For benchmarking, I would recommend using the "bench" program that comes with FFTW. e.g. you can compare for a size-1024 FFT with and without the SSE/SSE2 kernels just by doing:
    ./bench -opatient 1024
    ./bench -opatient -onosimd 1024
On my 64-bit Intel Xeon E5440 running FFTW 3.2.2 and Debian GNU/Linux, the SSE/SSE2 version is faster for size 1024 by a factor of 1.7 in double precision and by a factor of 3.4 in single precision.

Steven G. Johnson

Reply to: