[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Large static datasets like genomes (Re: Reasonable maximum package size ?)



On Sunday 10 June 2007 17:20:54 you wrote:
> On 9 Jun 2007, at 11:27 am, Steffen Moeller wrote:
> > Once a (computational) biologist starts a new
> > project, (s)he wants the latest data no matter what and anything
> > older than
> > three months (or a week sometimes) is likely not to be acceptable.
>
> Actually, my experience is that they tend to want diametrically
> opposite things,
> at the same time.
>
> 1)  When starting a new project, they usually want the very latest data.
> 2)  But they usually then want to keep that data static for the
> lifetime of
>      the project.

:o) very true. For 1) I hink that Debian packages for databases do not work. 
They might well work for 2), though. 

But ... how can one directly access a feature on the genome that has no 
accession number because you have just found it across releases of Ensembl?

*  base pairs and chromosome ID does not work across (NCBI) releases
*  centiMorgans are too vague
* distances in bp relative to the nearest genomic marker? Not too bad, 
probably.

The easiest seems indeed to keep the data on which whatever results are 
computed which is diagnosed as behaviour 2).  And 1) is done in order to be 
close to up-to-date at least when the Journal's reviewers inspect the 
work :o)  I actually think that Debian packages can help at least with the 
tools used for the analysis since the updating is technically easy ... unless 
when you have some Perl 5.0.x-specific code this means. Ouch.

Many greetings

Steffen
 


-- 
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: