[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

data sets and/or access to data sets



Hello,

I wonder if anybody has thought about providing large data sets, like genomes, microarray data, etc. into debian "packages" in a way that makes it easy for users to get those data sets onto their machine, making it easier to use various tools?  I can think of many great ways this would be useful.

For example, If a user had high-throughput sequencing data that they need to align to a genome.  Now there is a tool available in debian called bowtie that will do the job but the user needs to 1) download the genome and 2) generate the bowtie index.  Wouldn't it be great if you just type:

apt-get install bowtie-human-genome-index

which installed the genome and the pre-built indexes, then they could just run bowtie directly.

Or another example is wanting to do your own BLAST searches, why not a package that has the BLAST database indexes:

apt-get install BLAST-human-genome

What is nice is all of these data sets could be maintained in a global directory space, like /usr/share, so all of the tools could share this space preventing duplication, and made available to all users on the system.  Right now every user has to figure out how to manage their data individually which can be difficult for biologists.

What do you think?

Scott


Reply to: