[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Advice on cluster hardware



How ones application(s) work is really the main factor about what type of
compute system they should run on.  Unfortunately we all don't have money
for big SMP machines.  With your modest budget for 10 to 15 compute nodes
you would be best served by sticking with GigEthernet for your
interconnect between machines.  Depending upon how chatty your parallel
applications are among one another the throughput that GigE provides is
generally good enough for such a small machine count.  If you do choose to
look at high speed interconnects, such as Myrinet or InfiniBand, look to
easily add $3000 per node to the final cost of your solution.

I would recommend not buying the latest greatest processors when spec'ing
out your cluster in order to get the most bang for your buck.  When I was
a cluster buying customer I would limit myself to $1000/cpu.  I often try
to make the same recommendation to people buying new clusters, especially
if it is their first cluster.

Furthermore, if I was going to build a cluster today I would be hard
pressed not to strongly look at the AMD Opteron platform.  In 32bit mode
they are as fast, if not faster, than most XEON procs and give you the
ability to migrate your code over to 64bit when you see fit.  But don't
take my word for it, most vendors will be more than will to let you
benchmark your codes on various types of system architectures if you ask.

File serving can easily be done by a linux box via NFS on this size of
cluster.  NFS performance will degrade I/O bound applications so they
should use a local scratch disk when ever possible.  I'd also recommend
looking at IDE to SCSI disk arrays as they provide a ton (ok, on the order
of 4 Terabytes)  of storage in 4U of space, yet yield some decent
Read/Write speeds of 60 MB/S and won't break your budget.

When buying a cluster you must not forget to take into account the machine
room requirements it will have.  Do not under estimate the amount of heat
that even 10 dual processor nodes can produce if not properly cooled.

Manageability of the system must be considered.  If possible try to budget
in a serial console terminal server and some intelligent PDUs as they will
save you time and headache when things break.  A nice server deployment
system such as FAI, Oscar, ROCKS or warewulf-cluster will save the
administrator and scientists time when servers fail.

happy shopping..

-mike

-- 
hanulec@hanulec.com				cell: 516.410.4478
https://secure.hanulec.com	      EFnet irc && aol im: hanulec

On Wed, 3 Dec 2003, Ross Boylan wrote:

> Although this list seems to have been quiet recently, perhaps there are
> some folks out there with wisdom to share.  I didn't turn up much in the
> archives.
>
> The group I am in is about to purchase a cluster.  If anyone on this
> list has any advice on what type of hardware (or software) would be
> best, I'd appreciate it.
>
> We will have two broad types of uses: simulation studies for
> epidemiology (with people or cases as the units) and genetic and protein
> studies, with which I am less familiar.  The simulation studies are
> likely to make heavy use of R.  I suspect that the two uses have much
> different characteristics, e.g., in terms of the size of the datasets to
> manipulate and the best tradeoffs outlined below.
>
> Other uses are possible.
>
> Among other issues we are wondering about:
> *Tradeoffs between CPU speed, memory, internode communication speed,
> disk size, and disk speed.
>
> As a first cut, I expect the simulations suggest emphasizing processor
> power and ensuring adequate memory.  On the other hand, the fact that
> it's easy to upgrade CPUs suggests putting more money into the network
> supporting the CPUs.  And I suspect the genomics emphasizes more the
> ability to move large amounts of data around quickly (across network and
> to disk).
>
> *Appropriate disk architecture (e.g., local disks vs shared netword
> disks or SANS).
>
> 32 vs 64 bit; Intel vs AMD.
>
> We assume it will be some kind of Linux OS (we like Debian, but vendors
> tend to supply RH and Debian lacks support for 64 bit AMD in any
> official way, unlike Suse or RH).  If there's a good reason, we could
> use something else.
>
> Our budget is relatively modest, enough perhaps for 10-15 dual-processor
> nodes.  We hope to expand later.
>
> As a side issue, more a personal curiosity, why do clusters all seem to
> be built on dual-processor nodes?  Why not more CPU's per node?
>
> Thanks for any help you can offer.
>



Reply to: