Re: fftw: Usage of SSE in 64bit?
- To: email@example.com
- Subject: Re: fftw: Usage of SSE in 64bit?
- From: "Steven G. Johnson" <firstname.lastname@example.org>
- Date: Mon, 20 Jun 2011 23:55:40 -0400
- Message-id: <email@example.com>
- In-reply-to: <firstname.lastname@example.org>
- References: <email@example.com>
(Reposting, as this message does not seem to have gone through.)
I am one of the FFTW developers, and wanted to comment on this.
Yes, you should definitely use --enable-sse/--enable-sse2 flags in when
compiling single/double precision versions of FFTW on all x86 and x86-64
platforms. This is *not* just a matter of compiler flags -- it enables
the compilation of special computational kernels in FFTW that explicitly
use SSE/SSE2 intrinsics.
In addition to x86-64, note that this is SAFE to enable in general for
all 32-bit x86 platforms. FFTW checks at runtime to see whether the
processor supports SSE/SSE2 and disables its SSE/SSE2 code if not.
(Similarly for Altivec on PowerPC, and similarly in the next release for
In general, I would recommend that the packager read the FFTW
installation manual closely, since it documents these options.
For benchmarking, I would recommend using the "bench" program that comes
with FFTW. e.g. you can compare for a size-1024 FFT with and without the
SSE/SSE2 kernels just by doing:
./bench -opatient 1024
./bench -opatient -onosimd 1024
On my 64-bit Intel Xeon E5440 running FFTW 3.2.2 and Debian GNU/Linux,
the SSE/SSE2 version is faster for size 1024 by a factor of 1.7 in
double precision and by a factor of 3.4 in single precision.
Steven G. Johnson