Re: fftw: Usage of SSE in 64bit?

To: debian-science@lists.debian.org
Subject: Re: fftw: Usage of SSE in 64bit?
From: "Steven G. Johnson" <stevenj@alum.mit.edu>
Date: Tue, 21 Jun 2011 12:25:47 -0400
Message-id: <[🔎] itqgmb$jsn$1@dough.gmane.org>
In-reply-to: <[🔎] 201106210740.23681.carsten.aulbert@aei.mpg.de>
References: <201103231452.27320.carsten.aulbert@aei.mpg.de> <[🔎] itp4ns$ate$1@dough.gmane.org> <[🔎] 201106210740.23681.carsten.aulbert@aei.mpg.de>

On 6/21/11 1:40 AM, Carsten Aulbert wrote:

In addition to x86-64, note that this is SAFE to enable in general for
all 32-bit x86 platforms.  FFTW checks at runtime to see whether the
processor supports SSE/SSE2 and disables its SSE/SSE2 code if not.
(Similarly for Altivec on PowerPC, and similarly in the next release for
AVX instructions.)


Well, that depends what you are aiming for. If you want to have a single 32bit
x86 package which is guaranteed to work for all x86 compatible CPus out there
starting say at a Pentium II level you have to ensure that this will still
work - for my case where I have ~ 1800 computers doing number crunching and
all are 64bit this is another matter then the one Debian has for packaging.

Your example seems a little off because presumably your cluster uses theamd64 distro and would not be using i386 packages at all.

However, the larger point is that FFTW is designed to reduce thistension between portability and performance. Running in 32-bit mode, itis indeed the case that you can have a *single* FFTW binary that runs oneverything from (literally) a 386 to a modern processor, and still getnear-optimal performance (for 32-bit mode) on the modern processor.Features like SSE2 are automatically enabled on the modern processor anddisabled on old processors like the 386, because we explicitlysegregated the new instructions into separate kernels that we candisable at runtime.

This way, Debian can have a single binary package of FFTW for eacharchitecture without sacrificing performance on modern processors.

(Note that Debian should configure FFTW with --enable-portable-binary touse -mtune instead of -march ... last I checked, Debian already didthis. From what I recall, this makes a near-negligible difference inperformance. FFTW may be somewhat unusual in that it doesn't benefittoo much from arch-specific compiler cleverness ... indeed, we actuallyhave to manually disable some of gcc's optimizations to prevent themfrom screwing up our code schedule.)

For benchmarking, I would recommend using the "bench" program that comes
with FFTW. e.g. you can compare for a size-1024 FFT with and without the
SSE/SSE2 kernels just by doing:
      ./bench -opatient 1024
      ./bench -opatient -onosimd 1024
On my 64-bit Intel Xeon E5440 running FFTW 3.2.2 and Debian GNU/Linux,
the SSE/SSE2 version is faster for size 1024 by a factor of 1.7 in
double precision and by a factor of 3.4 in single precision.


Interesting, I think I need to rerun my tests again but then again this could
be that I was just using a 'measured' plan.

No, I get exactly the same performance in measured mode (omit -opatientabove) vs. patient mode -- for such a small transform they give the samealgorithm. (There is a sacrifice in estimate [-oestimate] mode, buteven there I get a 1.5 speedup in double precision and a 3.32 speedup insingle precision.) This is just a stock FFTW 3.2.2 with ./configure--enable-sse --enable-float or ./configure --enable-sse2, with FFTW'sdefault compiler flags.

Possibly you are using FFTW suboptimally in some other way, or there isa problem with your benchmark. e.g. are you including the plan creationtime (or worse, re-creating the plan for each transform)? Or possiblyyou have some other problem (e.g. if you repeatedly FFT the same nonzeroarray, it is a diverging process and eventually you are timingfloating-point exceptions). If you don't obtain speedups comparable tomine using FFTW's bench program as above, please email fftw@fftw.org.


Steven

PS. A general comment: the FFTW authors use Debian ourselves, and we arevery willing to offer advice to Debian packagers (or indeed to packagersfor any GNU/Linux distro). Although I try to search mailing listsoccasionally, it would be easier for us to keep on top of things ifDebian made a greater effort to contact upstream authors when issues arise.

Reply to:

References:
- Re: fftw: Usage of SSE in 64bit?
  - From: "Steven G. Johnson" <stevenj@alum.mit.edu>
- Re: fftw: Usage of SSE in 64bit?
  - From: Carsten Aulbert <carsten.aulbert@aei.mpg.de>

Prev by Date: Re: Bug#631202: ITP: spacefuncs -- Python module for N-dimensional space calculations
Next by Date: Re: Proposal: Debian Science mailing lists
Previous by thread: Re: fftw: Usage of SSE in 64bit?
Next by thread: Re: Bug#631202: ITP: spacefuncs -- Python module for N-dimensional space calculations
Index(es):
- Date
- Thread