Hi all, this one dragged on longer than wanted, but daily work sucks away a great deal of time :) Attached to this message you will find 3 graphs and a tarball of summary results all obtained on one of our compute nodes which have been otherwise idle. The system is (still) running Debian Lenny in amd64 flavor, CPU is a Xeon X3220, 2.4 GHz Quad core (though I only used a single core for these tests), 8 GB RAM, FFT sizes range from 2**4 to 2**27 points, out-of-place transforms I used the stock debian package as a reference (3.1.2-3.1) and recompiled versions there-of with different options (mixture of --enable-ssh --enable-fma --enable-alloca). Most significantly the change was when using SSE on amd64 which gave almost a factor of two in speed. It's true that gcc automatically enables SSE enhancements on 64bit, but it seems FFTW has also special code optimizations for SSE which we don't use with stock Debian fftw. Thus my request would be to use --enable-sse on amd64 as well, i.e. patch the debian/rules file. OK, let's the discussion begin ;) Cheers Carsten Gory details: %%%%%%%%%%%%%%%%%% debian-vs-optim-estimated-plan.svg These test were performed with the FFTW_ESTIMATE plan with the following FFTW compile options: debian: stock libraries from /usr/lib alloca: recompiled with --enable-alloca fftw-default: baseline check, recompiled fftw without special options fma: recompiled with --enable-fma fma-sse-alloca: recompiled with --enable-fma --enable-alloca --enable-sse sse: recompiled with --enable-sse core2+all else: recompiled with -mtune=core2 and all of fma-sse-alloca Clear result from this (apart from hitting different CPU cache size limits) is that just enabling sse yields a performance boost of up to 100% %%%%%%%%%%%%%%%%%%% debian-vs-optim-measure-plan.svg Same as above, but now with FFTW_MEASURE yielding essentially the same, that we want to have better amd64 libraries in Debian ;) %%%%%%%%%%%%%%%%%%% debian-final.svg Final comparison between stock Debian fftw and --enable-sse recompiled version, here one sees multiple things: * Users want to have --enable-sse for amd64 :) * Users should always use FFTW_MEASURE (or even more and save their plan) if they plan to use fftw heavily. %%%%%%%%%%%%%%%%%%% Raw result files have a simple column-oriented structure: 1. size of FFT 2. theoretically needed flops to perform one FFT (5*N*log_2(N)/2) 3. time for plan generation in microseconds 4. time per FFT in nanoseconds 5. number of iterations (each test ran for at least 60s) 6. theoretical MFlops/s of CPU (that's what plotted above) the latter 4 columns are repeated for each plan, i.e. here these are for FFTW_ESTIMATE and FFTW_MEASURE
Attachment:
debian-vs-optim-measure-plan.svg
Description: image/svg
Attachment:
debian-vs-optim-estimated-plan.svg
Description: image/svg
Attachment:
debian-final.svg
Description: image/svg
Attachment:
fftw_raw_results.tar.gz
Description: application/compressed-tar