Re: data sets and/or access to data sets
just few cents. In the domain of neuroimaging we are also confronted
with the problem of distributing data. Various aspects are relevant to
this question if someone is to package data "statically" (instead of
fetching via some data-sharing framework) into a proper Debian
package:
1. with a classical Debian package large sizes of data get
duplicated both in source and binary packages.
Although could be overcome via some means, for our domain of interest,
http://neuro.debian.net/datasets.html provides data in both binary
and source packages with the idea, that non-Debian users can still
simply fetch .orig.tar.gz if they need to get ahold of the data, e.g.
separate tarballs per subject from
http://neuro.debian.net/debian/pool/main/h/haxby2001/
2. what is the appropriate license for data ;) in quite a few
jurisdictions data is not copyrightable per se at all thus plain common
licenses tailored toward software are not appropriate (even CC [1]). EU
has SUI generis database rights while there is no similar mechanism in
the states afaik, suggesting the necessity of license terms
addressing such differences
so while releasing/packaging data viable description of terms
should be attached to be appropriate in different jurisdictions, e.g.,
as recommended by Hendrik Weimer on debian-legal [2] -- ODC Public
Domain Dedication and Licence (PDDL) [3].
[1] http://bibwild.wordpress.com/2008/11/24/creative-commons-is-not-appropriate-for-data/
[2] http://lists.debian.org/debian-legal/2011/01/msg00049.html
[3] http://www.opendatacommons.org/licenses/pddl/1.0/
On Tue, 15 Feb 2011, Andreas Tille wrote:
> Hi Scott,
> I think your idea is quite reasonable in principle. As far as I
> understood (but I did not dived into this) the getData effort[1] is one
> step into this direction and the to be soon uploaded package Biomaj does
> something that might be helpful as well.
> Regarding to actually buold packages: There were several ideas in the
> past to have some data.debian.org archive which contains large data sets
> where the packages you would suggest probably would fit into. However,
> to the best of my Knowledge this was not yet implemented for practical
> use.
> Do we want to try another shot onto a Google Summer of Code project
> into this direction?
> Kind regards
> Andreas.
> [1] http://wiki.debian.org/getData
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
Reply to: