Re: Advice on cluster hardware
Ross Boylan wrote:
Although this list seems to have been quiet recently, perhaps there are
some folks out there with wisdom to share. I didn't turn up much in the
The group I am in is about to purchase a cluster. If anyone on this
list has any advice on what type of hardware (or software) would be
best, I'd appreciate it.
We will have two broad types of uses: simulation studies for
epidemiology (with people or cases as the units) and genetic and protein
studies, with which I am less familiar. The simulation studies are
likely to make heavy use of R. I suspect that the two uses have much
different characteristics, e.g., in terms of the size of the datasets to
manipulate and the best tradeoffs outlined below.
Are the code MPI at all? I've only looked at R in passing
so I don't know if it's parallel or not.
Other uses are possible.
Among other issues we are wondering about:
*Tradeoffs between CPU speed, memory, internode communication speed,
disk size, and disk speed.
Let's talk about your apps first.
As a first cut, I expect the simulations suggest emphasizing processor
power and ensuring adequate memory. On the other hand, the fact that
it's easy to upgrade CPUs suggests putting more money into the network
supporting the CPUs. And I suspect the genomics emphasizes more the
ability to move large amounts of data around quickly (across network and
This is true. However, there is also a tradeoff between $ for
network and buying more nodes. GigE isn't too bad right now.
Otherwise IB is priced right, although for your system size
it might be a little costly.
*Appropriate disk architecture (e.g., local disks vs shared netword
disks or SANS).
Well, you might want to think about a nice size NAS box for
central storage. I would recommend a local disk unless there
is an overriding reason not to (such as security, reliability,
heat, even $ to a small degree). I know a cool cluster distribution
but it's not debian based (could be in the future though - maybe
that's a good idea for a future project).
Let's continue to talk over the list and let other people
chime in as well.
32 vs 64 bit; Intel vs AMD.
We assume it will be some kind of Linux OS (we like Debian, but vendors
tend to supply RH and Debian lacks support for 64 bit AMD in any
official way, unlike Suse or RH). If there's a good reason, we could
use something else.
Our budget is relatively modest, enough perhaps for 10-15 dual-processor
nodes. We hope to expand later.
As a side issue, more a personal curiosity, why do clusters all seem to
be built on dual-processor nodes? Why not more CPU's per node?
Thanks for any help you can offer.