[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Large static datasets like genomes (Re: Reasonable maximum package size ?)

On 6 Jun 2007, at 12:00 pm, Andreas Tille wrote:

On Wed, 6 Jun 2007, Tim Cutts wrote:
... (some interesting points)

There were many valid points in your mail but even if the issue
was raised at the example of biological data it is a more general
issue for others as well.  It might be that we could:

   0. Find a solution for large data sets in generel
   1. Find a solution for static biological data (I couldn't believe
      that all biological data are really changing that frequently).

Sadly, it does. Data sets like the human genome are being constantly tweaked and re-evaluated in the light of more recent discoveries. New assemblies of the raw sequence are produced at irregular intervals, but additional data in the form of ESTs (expressed sequence tags, for the non-bioinformaticians that are still awake) and similar forms comes online all the time, and that's why Ensembl is completely rebuilt from scratch every two months. It used to be once a month, but now that we do more than 20 genomes, and not just human, we don't have enough human and silicon resources to do the whole thing that frequently any more.

   2. Find a solution that might make the kind of handling of
      dynamical data as you described more user firendly (bittorrent).

Bittorrent's main downside, as far as I can see, is that it tends to work best when there are lots of subscribers to the data, which isn't always the case. It also tends to require the sort of firewall rules which scare the pants of most network admins.

As an aside, several people in this list are interested in large
scale biological data, and you may or may not be aware

A least readers of debian-med mailing list should be aware :) :


Cool, thanks for that.

As the message above says I would like to organise a Debian-Med
day where I would really like to discuss some things in a round
of people who are interested in medicine and microbiology.  It
would be great if you would join this.


  I have no idea whether
you will be in Edinburgh the whole time.  If not just suggest
a day where you would like to meet with others (in case you are

I'm not there for the whole thing; I arrive on Tuesday at about lunchtime, and will be leaving on Friday afternoon.

I personally would be mostly interested in top 4 (Maintaining our own
package repository).

Fine - the top listed things were the things that first spring to mind, and that I usually find myself talking about!


The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Reply to: