[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: data sets and/or access to data sets



well -- this issue is tangentially related to the software: why should
we care about having Debian packages while there are CRAN, easy_install,
etc -- all those great tools to deploy software -- domain specific and
created by specialists.  Although such comparison is a stretch, I think
it has its own merits.  Encapsulating (at least core sets) data into
Debian packages makes them nicely integrated within the world of
software withing Debian; with clear and uniform means on how to specify
dependencies on data, on how to install, where to look for legal
information, the same canonical location for related software and data
etc.  Versioned dependencies become especially relevant aspect is
construction of regression tests of software depending on
corresponding data packages, e.g.
http://neuro.debian.net/pkgs/fsl-feeds.html.

I am not suggesting to replace all those data provider systems created
by professionals ;)  I am talking about complimenting them
whenever feasible/sensible for the Debian needs/purposes.

On Tue, 15 Feb 2011, Scott Christley wrote:


> I think putting the data itself into debian repository is problematic.  Regardless of any licensing issue, the shear amount of data is too great.  Better to let the professionals who are getting paid to manage the data (NCBI, KEGG, etc.) and download directly from those sites.  Pretty much all of them have ftp/http access to acquire data.

> I like the getData effort.  Have a set of "data descriptors" with information about how/where to get data, then when requested performs the download.  This is very much the architecture I was thinking about.  I see a number of ways the project could be expanded.  I would like to hear thoughts from Steffen and Charles about getData before I jump in with a bunch of additions.

> The biomaj projects looks interesting as well.  One possibility is to use it as the underlying data retrieval layer, but it also may be "too complex" for basic retrieval functions.

> Scott
-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic


Reply to: