[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: GPGPU computing

On Tue, 8 Sep 2009 11:22:41 +0200
Francesco Pietra <chiendarret@gmail.com> wrote:

> Chemical computations, such as of molecular dynamics, that rely on
> clusters or uma-type computers, are starting to be performed through
> GPGPU technology, that is by putting graphical boards to general
> floating point use. The first reports are of 10 to 80 times speeding
> up with respect to the best single processors, i.e., something that so
> far required big multicore machines for traditional computing. NVIDIA
> CUDA seems to be a leader in this area.
> As an amd64 user on traditional uma-type keyboards or clusters, may I
> ask where to get independent information as to the hardware/software
> required for GPGPU computing?
> thanks
> francesco pietra

There is also:


As far as Nvidia is concerned, any Geforce8 or newer card is supported
by CUDA. You can find the complete list at:


I've been looking into this as well for implementing a high-performance
real-time measurement system (high-speed camera connected via gigabit
ethernet, and real-time image processing on the GPU), and I've come to
understand three things about GPGPU, which might be of interest to you
as well:

1) One problem related to GPGPU is that of the overhead of transferring
data to and from the GPU. It must be that the required computation is
"heavy" enough, in order to make good use of the massive GPU processor
and hide the delays of data transfers. Otherwise, you might find that
the GPU-based implementation is slower than the CPU-based one.

2) Much related to (1), is the importance of proper memory management
on the GPU. There are many papers and publications about this out there.

3) All these incredible performances that are quoted by manufacturers
(now in excess of 1 TFLOP) are for single-precision floating point math.
If you need double-precision, then you should look in the "finer print",
where you will see that double-precision is about 5-10 times slower
(compared to single-precision). As an example, Nvidia Tesla C1060
claims 933 GFLOPS in single precision, and 78 GFLOPS in double


Reply to: