[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: workstation for CUDA



I also forgot a most important aspect of CUDA in the choice of the
mainboard. Currently, the GPUs carry out only a part - albeit major -
of the task (non bonded forces), while bonded forces and PME
long-range forces are left to the CPUs. In practice, a CPU to GPU 2:1
ratio is still needed. The situation is not likely to change rapidly
due to enormous task of parallelizing with the myriad of loci on GPUs.
Perhaps, developers are waiting until GPU-CPU integrated boards are
available.
francesco pietra


---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Sat, May 14, 2011 at 6:41 PM
Subject: Fwd: workstation for CUDA
To: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>, amd64 Debian
<debian-amd64@lists.debian.org>


Following a market search, i have to reformulate my question by
replacing GTX-470 with GTX-570. The former seems to have nearly
disappeared, or i costs nearly as much as the latter.
francesco pietra


---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Sat, May 14, 2011 at 8:06 AM
Subject: Re: workstation for CUDA
To: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>


Hi:
in the meantime i had the opportunity to carry out my simulations on
both a high-end tesla gpu computer and a very simple consumer-type
computer based on a single gtx-470, both running on debian amd 64
lenny. As all classical molecular dynamics codes are compiled for
single precision, there was no advantage with the tesla, actually,
gxt-470 run a bit faster when comparing 1:1 graphics cards number.

Therefore, as all expenses for this computer will be supported by
myself, i am looking for up-to-date consumer components, except the
power source, hard disks and fans, which should be server-type.
Ideally, the motherboard should support four gtx-470, less if no such
consumer motherboard exists. No X server will be installed, graphics
being examined at a ssh-linked desktop. From this, the choice of the
cpus and the power source descends. As the simulations last many days,
the cage must have place for many, large-diameter, fans, for use in
the open air. With a server-type, four socket cpu machine that i set
up a few years ago, there are eleven fans on the cage and no air
cooler was ever needed at the latitude i am based.

 i am particularly embarrassed any time i have to select hardware,
especially here. Such type of computations require little ram (a few
gb in the four-gtx470 case), performance being entirely to the
cpu/gpu. I understand that a consumer motherboard  may well be a
bottle-neck, and should not. This explains my  difficulty in the
choice of the essential haraware. i am prepared to accept a compromise
in the performance

thanks a lot for advice
francesco pietra

On Wed, Jan 26, 2011 at 7:27 PM, Lennart Sorensen
<lsorense@csclub.uwaterloo.ca> wrote:
> On Tue, Jan 25, 2011 at 05:00:51PM -0500, Lennart Sorensen wrote:
>> Most boards technically don't have enough bandwidth.  However when
>> they use the NF200 PCIe switch, it tends to work quite well.  The NF200
>> actually allows broadcast of the same data to multiple cards if that is
>> what is needed.  Each card has potentially 16 lanes, but it is sharing
>> with another card for those 16 lanes.  If you are sending data to just
>> one card, it will get full speed.  If you are sending to both at once,
>> they get half speed, unless you are sending the same data to both in
>> which case they can get full bandwidth.
>>
>> Some designs have simply got 8 lanes per slot all the time.  That will
>> be slower of course.  So if it has the NF200 it should be very good,
>> and otherwise it will be speed limited to 8x.  Now if you are doing heavy
>> calculations with data that fits in the card, it doesn't really matter.
>> If your data set is larger than the card can hold and you have to move
>> data to the card all the time, the bandwidth could be an issue.  That's
>> also when the extra memory on a tesla might start to make a difference.
>>
>> Really serious boards do have enough lanes for 16x on each slot.
>>
>> The Tyan board I listed has the intel 5520 chipset, which has 36 lanes,
>> but since it has two 5520's, each with 36 lanes, it actually has enough
>> to run all four slots at 16x all the time.  So that is probably going
>> to be as fast as you can currently get.  The supermicro dual opteron
>> board in the machine you mentioned also has dual chipset, which also
>> gives it full 16x on all four slots.  The Asus P6T7 WS board uses the
>> NF200 chips instead.  It only has 36 lanes (and of course only one
>> CPU socket).
>
> Of course for really crazy (and expensive) there is this:
> http://www.colfax-intl.com/ms_tesla.asp?M=102
>
> Dual xeon and up to 8 tesla cards (it uses PCIe switches to share 16
> lanes between pairs of cards).
>
> Price is rather high when filled with tesla cards and ram and such.
> Probably fast though.
>
> --
> Len Sorensen
>


Reply to: