[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Packaging scientific datasets for Debian



On Sat, May 08, 2010 at 12:57:17PM -0500, Steve M. Robbins wrote:
> On Sat, May 08, 2010 at 10:05:45AM -0400, Michael Hanke wrote:
> > - some grouping by purpose (although many datasets can be used for
> >   different things) or type of data (MRI, pictures, sound
> >   databases, genome, ...)
> 
> I like all these suggestions.  I don't have a strong position on how
> to arrange the data, but my gut feeling is to use a single root under
> which it is grouped by purpose as you suggest, leading to
> /usr/share/data/<group>, for example.

I'm a little unsure about the level of sophistication that the grouping
should reflect. Consider this data:

  http://data.pymvpa.org/datasets/haxby2001/

It is an fMRI experiment that also comes with anatomical MRI images.
Where should it go?

/usr/share/data/mri/fmri/haxby2001/<subjects>

It is all MRI data, but its primary purpose is fMRI. But what happens
if the dataset also has behavioral data (e.g. recorded reaction time).
Would it be split into:

/usr/share/data/mri/fmri/haxby2001
/usr/share/data/behavioral/reactiontimes/haxby2001

Sounds complicated and not very useful, because one typically wants to
have all related data in one location.

What if the data comes with the actual stimulus images (simple JPEGs)?
it would still be an MRI dataset, but people looking for a database of
images would probably never look into /usr/share/data/mri...

Maybe we should try to prevent grouping datasets. People working with
MRI data probably tend to have MRI datasets installed, hence the
/usr/share/data will be all MRI-related. Maybe the above dataset could
be installed under:

/usr/share/data/haxby2001-objectcategorization

The contained fileformats would be encoded using debtags, and the rest
of useful categorization information goes into the package description.

What do you think?

Michael

-- 
GPG key:  1024D/3144BE0F Michael Hanke
http://mih.voxindeserto.de


Reply to: