[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Large static datasets like genomes (Re: Reasonable maximum package size ?)

On Wed, 6 Jun 2007, Tim Cutts wrote:
... (some interesting points)

There were many valid points in your mail but even if the issue
was raised at the example of biological data it is a more general
issue for others as well.  It might be that we could:

   0. Find a solution for large data sets in generel
   1. Find a solution for static biological data (I couldn't believe
      that all biological data are really changing that frequently).
   2. Find a solution that might make the kind of handling of
      dynamical data as you described more user firendly (bittorrent).

As an aside, several people in this list are interested in large
scale biological data, and you may or may not be aware

A least readers of debian-med mailing list should be aware :) :


that I'm
giving a presentation at debconf about what we're doing at the Sanger
Institute, and how Debian fits in.  I imagine some of you would like
to come to that talk, and if you wish to contact me off list to
suggest things you particularly want me to talk about, then please

As the message above says I would like to organise a Debian-Med
day where I would really like to discuss some things in a round
of people who are interested in medicine and microbiology.  It
would be great if you would join this.  I have no idea whether
you will be in Edinburgh the whole time.  If not just suggest
a day where you would like to meet with others (in case you are

Some of the topics I could cover:

1)  Management of a thousand node cluster (choice of hardware,
automated installation, configuration management, monitoring)
2)  Parallel filesystems (Lustre, GPFS, PVFS etc)
3)  Scalability issues in genomic analysis (especially in the
software which builds and then presents http://www.ensembl.org)
4)  Maintaining our own package repository
5)  Migration from Tru64 to Debian
6)  Multipath SAN access, failover and so on
7)  Approaches to job scheduling on large clusters
8)  Problems with MySQL at this scale

Feel free to suggest to me things that you'd find interesting to talk

I personally would be mostly interested in top 4 (Maintaining our own
package repository).

Kind regards



Reply to: