[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Advice on cluster hardware



Hi,

I'd focus more on the application. Depending on what app you have (and how
much you are willing to modify it) different choices can be made. If you
have the time you could get a couple of machines and start your apps and
monitor memory, CPU and network. This should give you some idea on what
you need most: start some monitoring tools at the same time and you can
tell what is the bottleneck. Remember that adding more nodes may have a
negative effect (not linear) on the network.

As for the CPU, in my experience it depends what you do: float, integer
or double? I've seen application flying on G4s, and other crawls (it
depends from doing floats or int). Memory bandwidh is another thing you
might be interested in: if you sweep through large data set in memory,
having a fast path to memory helps. And of course if you have
optimization for MMX (or SSE or ALTIVEC) now your CPU is been spoken
for.

So far I've seen very expensive quad processors and very inexpensive
dual ... So unless you can fit all your problem into 1 machine with 4
processors, I don't think it's cost effective. Unless we are talking
about Supercomputers ...

Just my 2 cents.

graziano


On Wed, Dec 03, 2003 at 07:44:38PM -0500, Jeffrey B. Layton wrote:
> Ross Boylan wrote:
> 
> >Although this list seems to have been quiet recently, perhaps there are
> >some folks out there with wisdom to share.  I didn't turn up much in the
> >archives.
> >
> >The group I am in is about to purchase a cluster.  If anyone on this
> >list has any advice on what type of hardware (or software) would be
> >best, I'd appreciate it.
> >
> >We will have two broad types of uses: simulation studies for
> >epidemiology (with people or cases as the units) and genetic and protein
> >studies, with which I am less familiar.  The simulation studies are
> >likely to make heavy use of R.  I suspect that the two uses have much
> >different characteristics, e.g., in terms of the size of the datasets to
> >manipulate and the best tradeoffs outlined below.
> >
> 
> Are the code MPI at all? I've only looked at R in passing
> so I don't know if it's parallel or not.
> 
> >Other uses are possible.
> >
> >Among other issues we are wondering about:
> >*Tradeoffs between CPU speed, memory, internode communication speed,
> >disk size, and disk speed.
> >
> 
> Let's talk about your apps first.
> 
> >As a first cut, I expect the simulations suggest emphasizing processor
> >power and ensuring adequate memory.  On the other hand, the fact that
> >it's easy to upgrade CPUs suggests putting more money into the network
> >supporting the CPUs.  And I suspect the genomics emphasizes more the
> >ability to move large amounts of data around quickly (across network and
> >to disk).
> >
> 
> This is true. However, there is also a tradeoff between $ for
> network and buying more nodes. GigE isn't too bad right now.
> Otherwise IB is priced right, although for your system size
> it might be a little costly.
> 
> >*Appropriate disk architecture (e.g., local disks vs shared netword
> >disks or SANS).
> >
> 
> Well, you might want to think about a nice size NAS box for
> central storage. I would recommend a local disk unless there
> is an overriding reason not to (such as security, reliability,
> heat, even $ to a small degree). I know a cool cluster distribution
> but it's not debian based (could be in the future though - maybe
> that's a good idea for a future project).
> 
> Let's continue to talk over the list and let other people
> chime in as well.
> 
> Jeff
> 
> 
> >32 vs 64 bit; Intel vs AMD.
> >
> >We assume it will be some kind of Linux OS (we like Debian, but vendors
> >tend to supply RH and Debian lacks support for 64 bit AMD in any
> >official way, unlike Suse or RH).  If there's a good reason, we could
> >use something else.
> >
> >Our budget is relatively modest, enough perhaps for 10-15 dual-processor
> >nodes.  We hope to expand later.
> >
> >As a side issue, more a personal curiosity, why do clusters all seem to
> >be built on dual-processor nodes?  Why not more CPU's per node?
> >
> >Thanks for any help you can offer.
> > 
> >
> 
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-beowulf-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact 
> listmaster@lists.debian.org
> 
> 

-- 
+-----------------------+--------------------------+
| Graziano Obertelli    | CS Dept. Rm 102          |
| graziano@cs.ucsb.edu  | University of California |
| (805) 893-5212        | Santa Barbara, CA 93106  |
+-----------------------+--------------------------+



Reply to: