[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Large static datasets like genomes (Re: Reasonable maximum package size ?)

On 10 Jun 2007, at 6:38 pm, Steffen Moeller wrote:

On Sunday 10 June 2007 17:20:54 you wrote:
On 9 Jun 2007, at 11:27 am, Steffen Moeller wrote:
Once a (computational) biologist starts a new
project, (s)he wants the latest data no matter what and anything
older than
three months (or a week sometimes) is likely not to be acceptable.

Actually, my experience is that they tend to want diametrically
opposite things,
at the same time.

1) When starting a new project, they usually want the very latest data.
2)  But they usually then want to keep that data static for the
lifetime of
     the project.

:o) very true. For 1) I hink that Debian packages for databases do not work.
They might well work for 2), though.

... except that they usually want several versions present at once, which
would mean a completely separate package name for each release.  Ick.

But ... how can one directly access a feature on the genome that has no accession number because you have just found it across releases of Ensembl?

*  base pairs and chromosome ID does not work across (NCBI) releases
*  centiMorgans are too vague
* distances in bp relative to the nearest genomic marker? Not too bad,

The easiest seems indeed to keep the data on which whatever results are
computed which is diagnosed as behaviour 2).

Oh, yes, I'm not saying the requirement isn't *reasonable*. It just makes life difficult!


The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: