[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [DRAFT FOR REVIEW] Debian GNU/Linux top 34 supercomputer at 32.8 TFlops for detecting gravitational waves at Max Planck Institute for Gravitational Physics



Here is the comment from the native speaker:

Hi Carsten,

It was just too hard for me to edit that Wiki page. It's unreadable in
its raw form. Instead, attached is a plain-text version of the article
with all corrections made. Except for the very last paragraph which is
too awful to edit and should be re-written or deleted. There were many
mistakes, so I guess you need to go through carefully and make the
relevant changes.

Cheers,

Martin

---------------------------------------------------------------

Debian GNU/Linux top 34 supercomputer at 32.8 TFlops for detecting
gravitational waves at Max Planck Institute for Gravitational Physics

The Observational Relativity and Cosmology Research Group is a team of
scientists working at the
Hannover Branch of the Max Planck Institute for Gravitational Physics
(Albert Einstein Institute)
in Hannover, Germany. Their goal is the direct detection of
gravitational waves which were first
predicted by Albert Einstein. They are working with the friends and
colleagues within the LIGO
Scientific Community and VIRGO [what's the proper name here?].

The massive computing effort is done at the ATLAS Debian GNU / Linux
1342 node cluster. Using 10+
TB RAM, approximately 1.3 PB storage and a specialty network able to
transfer almost 4 days worth
of DVD movies per second (2880 Gb/s), the cluster achieves a measured
performance (in terms of
top500.org linpack) of 32.8 TFlops, with a theoretical peak of about 50
TFlops. This performance
would place the ATLAS Debian GNU / Linux cluster in 4th place in
Germany, 11th in Europe and 34th
worldwide of the top500.org November 2007 list, at a cost of only EUR
1.8m (~ US$ 2.8m).

The ATLAS Debian GNU / Linux cluster consists of 1342 Supermicro
computer nodes (Intel Xeon 3220
quad-cores 2,4 GHz, 8 GB RAM, 500 GB Hitachi HDD, IPMI remote
management) along with 31 data
servers (2x Intel Xeon E5345 2,33 GHz, 16 GB RAM, Areca 1261ML, 16x750
GB Hitachi HDD) plus 4
similar head nodes with 4 x 750 GB HDD. Those are all running Debian GNU
/ Linux 4.0 Etch with a
few modifications like custom kernel and Condor queuing system.
Additional storage space is
supplied by 13 Sun Fire X4500 running Solaris 10. The system was built
from off-the-shelf computers
from a German company, Pyramid Computer GmbH.

One of the many hardware specialities they have is the network from
Woven Systems which is a
hierarchical fully non-blocking network. The EFX 1000 core switch
features 144 10 Gb/s CX4 ports
and connects currently to 32 TRX100 edge switches which feature 48 1
Gb/s ports and 4x10 Gb/s
uplinks, reaching 2880 Gb/s. Also, their Sun Fire X4500 machines are
directly connected to the core
switch.

The experience of using Debian GNU / Linux at Atlas, Merlin and Morgane
supercomputing clusters

The ATLAS Debian GNU / Linux cluster was designed, built and has been
managed by Dr Henning
Fehrmann and Dr Carsten Aulbert, who have been using Debian GNU / Linux
for years.

Atlas' brother and sister systems in Potsdam, Germany, "Merlin" and
"Morgane", are running Debian
GNU / Linux and have been managed by Dr. Steffen Grunewald for many
years; "the experience with
them has been very, very good", according to Dr. Aulbert.

According to Dr. Grunewald, the Merlin Debian GNU / Linux Beowulf 180
nodes cluster (launched in
2002) initially ran on an rpm based distribution, but in 2004 migrated
to Debian GNU / Linux after
the rpm distro vendor changed its licensing model. The total computing
power of the 360 CPU cores
has been estimated to be more than 1.3 Tflops peak; the data storage
capacity is about 20 TB
mirrored.

The Morgane Debian GNU / Linux Beowulf cluster, consisting of 615
compute nodes, 15 storage nodes,
and some head nodes, launched in December 2006. The total computing
power of the 1230 CPU cores has
been estimated to be more than 6 Tflops peak, the data storage capacity
is about 100 TB.

"Actually, with RH and its anaconda kickstart installer, for different
types of machines (hardware
and functionality-wise) I had one single master kickstart file that
would have been run through cpp
with proper defines set, to produce the actual kickstart file for a
specific set-up. While this
allowed maintaining only a single copy of install code, FAI with its
class model was a major
breakthrough, in readability, functionality, and maintainability.
There's no way back now.", said
Dr. Grunewald.

Beyond FAI, there are other useful tools for massive scale installation,
deployment and management
of Debian GNU / Linux machines for various scenarios.

"Debian features an extremely large set of packages, making it THE
distro of choice for keeping us
out of the hassle to package needed software ourselves", said Dr. Aulbert.

"Also Thomas Lange's FAI package is extremely useful for automatic
deployment of Debian [GNU /
Linux]. For example, without much tweaking and using only two hosts, we
were able to reinstall the
cluster in about 2.5 hours and were only limited by those two servers'
network connection."

"Two weeks ago I would have written something about the very good
security support, given that the
reaction to the OpenSSL stuff was very good. I could still do, but in
reality we don't need
security updates except for the exposed nodes such as head nodes.
Everything else is just visible
internally."

As additional benefits of using Debian GNU / Linux, he cited:

the simplicity of creating own packages
how repositories can be set-up easily (the cluster use reprepro)
using clean build environments (pbuilder and similar packages)
and, of course, the superb packaging infrastructure in general (dpkg,
apt, aptitude, synaptic and many useful APT tools)
By using Debian GNU / Linux at its clusters, the Observational
Relativity and Cosmology Research Group reduced the amount of work
needed at the hardware and software infrastructure, compared to other
scientific clusters running on other distributions, focusing at their
objective of detecting gravitational waves.

"Personally, I like community distros more since they offer more
long-term stability than a distro
which is governed by the need of releasing often to generate revenue.
Although on the downside it
would be better for us to have a more settled release plan and / or some
kind of "stable and
supported" backports [for the specific software we use], said Dr. Aulbert.

Currently, the Debian Project is refining its release methods to
accomplish a more regular release
target of 18 months for the biggest officially and security maintained
distribution ever
accomplished (24,000+ packages). The expected next release is on track
as of May 2008.

The Debian Backports site has been actively maintained for 5 years by
Debian Developers who are the
only allowed to upload packages to it. Special requests for an official
backport not already
available could be submitted at the Debian Bug Tracking System as
wishlist, and could contain the
needed patches to the Debian Developer official maintainer backport the
package from Testing or
even Unstable to Stable.

The Debian Project is holding discussions on its developers mailing list
to improve its auditing
and quality processes to prevent in very early stages of development any
security and quality
issues at such large set of packages, beyond the prompt reaction
Security Team for released
packages.

---------------------------------------------


I'll be out of town till Monday, so please try to incorporate some of
the changes.

Thanks

Carsten


Reply to: