Re: [DRAFT FOR REVIEW] Sanger Institute Debian cluster 320 TB swap 1.5PB storage
I have added more tweaks.
Phil Butcher wrote:
I have reviewed the text and made small modifications for it to
be more accurate.
I hope this is OK.
From: Andre Felipe Machado [mailto:email@example.com]
Sent: 04 March 2008 02:09
Cc: firstname.lastname@example.org; email@example.com; Phil Butcher
Subject: [DRAFT FOR REVIEW] Sanger Institute Debian cluster 320 TB swap
Please, review the attached draft looking for errors and improvements.
The most updated draft version is maintained, and rendered, at .
The target publishing date is March 6th, 2008, 12:00 GMT and corrections
should be submitted to the debian-publicity list  until that
Andre Felipe Machado
(anyone can post to the list, but only suscribers will receive msg)
Tony Cox firstname.lastname@example.org
Sanger Institute www.sanger.ac.uk
Wellcome Trust Genome Campus Head, Seq. Informatics
Hinxton, Cambs. CB10 1SA Tel: +44 1223 834244
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
<h1>Wellcome Trust Sanger Institute, UK, uses a Debian cluster with 320 TB
HP-SFS (Lustre) filesystem as part of it's 1.5 PB storage for human genome sequencing</h1>
<a href="http://www.sanger.ac.uk/">Wellcome Trust Sanger Institute</a>,
<a href="http://en.wikipedia.org/wiki/South_Cambridgeshire">South Cambridgeshire</a>,
a 640+ cores Debian GNU / Linux cluster
with 320 Terabytes of "live data", like a giant memory swap partition.
Each of the 27 new tehcnology robotic computorized genome sequencers generates 1 TB of image data each three days,
at a 2 MB/s rate during a 3 day run.
This amount of data needs to be "live" during the sequencing and initial analysis,
and with the processing needs of the scientific software on
<a href="http://www.debian.org/users/org/sangerinstitute.en.html">the Debian GNU
/ Linux 640+ cores cluster</a>, the "swap-like" storage needs to provide 320TB of space.
Antony Cox, PhD, the Head of Sequencing Informatics, and Phil Butcher, the Head
of IT at the institute, gave
<a href="http://www.guardian.co.uk/technology/2008/feb/28/research.computing">an interview</a>
to The Guardian, presenting the Thousand Genome Project.
The project aims to accurately sequence one thousand individual human genomes
to map all of their differences in 0,5% or more of the population sampled, and
identify the places involved in the interactions between multiple DNA bases
that cause different conditions.
Given that the human DNA has 3 billion bases, and each sampled base must be sequenced
between 11 and 30 times to factor out measurement
errors, this is one of the biggest computational biology efforts of today.
The project is unique not only because of dealing with 1.5 PB of storage, but
for keeping 320 TB of "swap-like storage" for fast comparisons and calculations.
According to Butcher, genomics research is changing focus from the laboratory of glass tubes and moving
to be more informatics focussed. The Sanger Institute started using Debian GNU / Linux when the world
discovered how reliable and useful it can be.
Now the institute has to compete with commercial organisations using Linux for system administrators
able to manage large clusters with large-scale distributed filesystems.
You may read
<a href="http://www.guardian.co.uk/technology/2008/feb/28/research.computing">the interview</a>
for more details.
<h2>About the Wellcome Trust Sanger Institute</h2>
<a href="http://www.sanger.ac.uk/">The Wellcome Trust Sanger Institute</a>
is one of the world's largest centres for DNA sequencing and analysis. It made
the largest single contribution to the sequence of the
<a href="http://www.sanger.ac.uk/HGP/">Human Genome Project</a>,
contributed approximately 25% of the
<a href="http://www.sanger.ac.uk/Projects/M_musculus/">mouse genome sequence</a>,
is finishing the
<a href="http://www.sanger.ac.uk/Projects/D_rerio/">zebrafish genome sequence</a>
as well as making contributions to other model organism sequences, such as
and the nematode
<a href="http://www.sanger.ac.uk/Projects/C_elegans/">C. elegans</a>.
Institute researchers have also contributed to the sequence of more that 60
finished genomes of bacterial pathogens, such as Salmonella typhi, TB, MRSA and
Cdiff, as well as parasites such as those causing malaria, African
trypanosomiasis and Leishmaniasis.
<a href="http://www.sanger.ac.uk/Info/News-releases/2007/071206.shtml">new-technology sequencing</a>
will dramatically increase the breadth and depth of genome analysis in humans,
model organisms and pathogens.
You can contact Wellcome Trust Sanger Institute press Team
<h2>About Debian Project</h2>
<p>Debian GNU / Linux is
<a href="http://www.debian.org/intro/free">free libre</a> operating systems
(GNU/Linux, GNU/Hurd, GNU/NetBSD, GNU/kFreeBSD),
developed by more than two thousand
<a href="http://asdfasdf.debian.net/~tar/bugstats/?8">volunteers</a> from
<a href="http://www.debian.org/devel/developers.loc">all over the world</a> who
<a href="http://www.debian.org/devel/">collaborate</a> via the
internet on the <a href="http://www.debian.org">Debian Project</a>.</p>
<p>Debian's dedication to
<a href="http://www.debian.org/intro/free">Free Libre Open Source Software</a>, its
non-profit nature, its
<a href="http://vote.debian.org/">open</a> and
governance</a> make it
<a href="http://www.debian.org/doc/manuals/project-history/">a first</a>
among free libre operating system distributions.</p>
<p>The Debian project's key strengths are
<a href="http://www.debian.org/devel/people">its volunteer base</a>,
its dedication to the
<a href="http://www.debian.org/social_contract">Debian Social Contract</a>,
and its <a href="http://wiki.debian.org/WhyDebianForDevelopers">commitment</a>
to provide the best operating systems attainable, following a
strict quality <a href="http://www.debian.org/doc/debian-policy">policy</a>,
working with an established
<a href="http://qa.debian.org/">QA Team</a>.
Debian Project without
<a href="http://wiki.debian.org/DebianForNonCoderContributors">even not being a programmer</a>,
or being a development and or service
<a href="http://www.debian.org/partners/">partner</a> company or institution at the
<a href="http://www.debian.org/partners/partners">Debian Partner Program</a>,
or simply making various
<a href="http://www.debian.org/donations">donations</a> to the Debian Project.
<p>Debian Project news, press releases and press coverage can be found
from the official Debian wiki
<a href="http://wiki.debian.org/News">page</a>. PR contact at
<a href="http://lists.debian.org/debian-publicity">debian-publicity list</a>.