Re: HPC with Beoqulf

To: debian-beowulf@lists.debian.org
Subject: Re: HPC with Beoqulf
From: Fabricio Cannini <fcannini@yahoo.com.br>
Date: Fri, 14 Jan 2011 21:08:23 -0200
Message-id: <[🔎] AANLkTimHVouA6THTaOp_ZfKk3g8jazSX9jee3SbP0iyo@mail.gmail.com>
In-reply-to: <[🔎] 201101141005.42166.carsten.aulbert@aei.mpg.de>
References: <[🔎] AANLkTikEcBcwQSyD0rx5x7NZB7DbK5EjFDYKqCh6XbGt@mail.gmail.com> <[🔎] 201101140947.45937.carsten.aulbert@aei.mpg.de> <[🔎] AANLkTikBrcbZPx+4u0w6MpgLii-cBjnjx0NLP+5JvuQ_@mail.gmail.com> <[🔎] 201101141005.42166.carsten.aulbert@aei.mpg.de>

Hi Patrick, Hi Carsten.

What you are trying to accomplish is certainly doable, Patrick. I have
done it myself, and it sure is a great experience.
I'll quote Carsten so that you can see what i have done.

> On Friday 14 January 2011 09:56:43 Patrick Schmid wrote:
>> If I understood it correct, the Beowulf project is just a bundle of
>> software to build a cluster.
>> So I'm referring to some software which allows me to build a Linux HPC.
>
> Yes and no :)
>
> You first need to identify what you want/need:
>
> * what is the (typical) problem you want to solve?

Molecular dynamics, DNA analysis... Most of them using MPI to run in
more than one node in parallel.

> * what software do you need for that,
> do you need a batch scheduler or do you have very few users which work at the same place
> and share the cluster without technical measures?

Here we are using the TORQUE scheduler
[http://www.clusterresources.com/products/torque-resource-manager.php
], which is avaiable in squeeze repos. It is an essential part of the
system, as there 100+ users to it.

> * think about the OS (Debian is a good choice here ;))

Agreed. :)
I've been using squeeze since i began researching about it (
april/2010 ), and it's been great.
All, really, all software i needed to setup a scientific computing
cluster was already packaged, just an 'aptitude install' away.
Scientific software properly you may need to compile, as we did with
all the ones used here.

> * Think about the compute hardware, you probably need a login node, execute nodes and a file server,
> do you need many local cores or are the problems too large to fit into a few nodes?

A file server is nice, but if the cluster will be small you can get by
with a beefed-up login node.
Of course, as the machine grows, you will feel the need to separate
tasks. A stand-alone installation server, backup||redundant storage
server, round-robin login nodes, you name it. The cluster i setup is
very homogeneous, and it's composed by 1 login node, 1 storage node,
and 22 processing nodes, with ethernet and infiniband DDR connections.
All nodes have 2 Xeon 5400 quad-core processors each.
As Carsten said, the kind of processing node will depend on the kind
of problem that you are trying to solve. Some tasks are more naturally
parallel, then you can use a higher number of lower frequency cores ,
like AMD 6000 series. Or your problem is not so easily
parallelism-friendly, so you will need higher frequency processors, or
faster memory access if your problem is memory-bound, then you will go
for an Intel 5600 or 7500 series. There's a lot to think about only in
choosing the processors of the nodes.

> Then you need to look into networking (Infiniband or high performance Ethernet),
> is the software susceptible to latency and/or bandwidth available......

My rule of thumb is:
Gigabit Ethernet for single/multi-threaded programs ran in a single
machine, with low or none communication between $output_dir and
processing node, and with output file(s) size at most in the low GB.
If your software can use more than one node ( e.g. through MPI ) for
inter-node parallelization of the task(s) , is latency-sensitive (
like CFD, FEM ) and/or the output files are really big ( tens of GB
and upwards ) , i'd go with 10Gigabit Ethernet or Infiniband.
Currenntly, infiniband is the best choice by bandwidth, latency, and
perhaps, price ( as in lower price than 10G ethernet ) , but it
demands modifications in the software to use it. 10Gigabit Ethernet,
on the other hand, doesn't, has reportedly decent latency and the
price is falling quick.

> You see there are MANY questions to look at first, before you even want to
> start installing the machines :)

I went with FAI to the automated installation, which is a superb piece
of software.
You will have to spend some time learning it and tuning it to do
exactly what you want, write some scripts and such, because it is not
a "Next, Next, Finish" installation system like, say, Rocks Linux. I
liked it, it is IMHO a much more powerful solution than
redhat/centos/rocks/fedora anaconda, but YMMV.

> Try describing a "typical" problem, the sizes involved, that may help a lot.
>
> Cheers
>
> Carsten

Reply to:

References:
- HPC with Beoqulf
  - From: Patrick Schmid <patrick.p.schmid@gmail.com>
- Re: HPC with Beoqulf
  - From: Carsten Aulbert <carsten.aulbert@aei.mpg.de>
- Re: HPC with Beoqulf
  - From: Patrick Schmid <patrick.p.schmid@gmail.com>
- Re: HPC with Beoqulf
  - From: Carsten Aulbert <carsten.aulbert@aei.mpg.de>

Prev by Date: Re: HPC with Beoqulf
Next by Date: Re: HPC with Beoqulf
Previous by thread: Re: HPC with Beoqulf
Next by thread: Re: HPC with Beoqulf
Index(es):
- Date
- Thread