Re: Large static datasets like genomes (Re: Reasonable maximum package size ?)
On 6 Jun 2007, at 12:00 pm, Andreas Tille wrote:
On Wed, 6 Jun 2007, Tim Cutts wrote:
... (some interesting points)
There were many valid points in your mail but even if the issue
was raised at the example of biological data it is a more general
issue for others as well. It might be that we could:
0. Find a solution for large data sets in generel
1. Find a solution for static biological data (I couldn't believe
that all biological data are really changing that frequently).
Sadly, it does. Data sets like the human genome are being constantly
tweaked and re-evaluated in the light of more recent discoveries.
New assemblies of the raw sequence are produced at irregular
intervals, but additional data in the form of ESTs (expressed
sequence tags, for the non-bioinformaticians that are still awake)
and similar forms comes online all the time, and that's why Ensembl
is completely rebuilt from scratch every two months. It used to be
once a month, but now that we do more than 20 genomes, and not just
human, we don't have enough human and silicon resources to do the
whole thing that frequently any more.
2. Find a solution that might make the kind of handling of
dynamical data as you described more user firendly (bittorrent).
Bittorrent's main downside, as far as I can see, is that it tends to
work best when there are lots of subscribers to the data, which isn't
always the case. It also tends to require the sort of firewall rules
which scare the pants of most network admins.
As an aside, several people in this list are interested in large
scale biological data, and you may or may not be aware
A least readers of debian-med mailing list should be aware :) :
Cool, thanks for that.
As the message above says I would like to organise a Debian-Med
day where I would really like to discuss some things in a round
of people who are interested in medicine and microbiology. It
would be great if you would join this.
I have no idea whether
you will be in Edinburgh the whole time. If not just suggest
a day where you would like to meet with others (in case you are
I'm not there for the whole thing; I arrive on Tuesday at about
lunchtime, and will be leaving on Friday afternoon.
I personally would be mostly interested in top 4 (Maintaining our own
Fine - the top listed things were the things that first spring to
mind, and that I usually find myself talking about!
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.