[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)



Le Mon, Oct 26, 2009 at 04:01:57PM +0100, Andreas Tille a écrit :
>  
> > > CREATE TABLE upstream-metadata (
> > >     package text,
> > >     key1    text,
> > >     key2    text,
> > >     ...
> > >     keyN    text,
> > >     PRIMARY KEY package
> > > );
> >  
> > > CREATE TABLE upstream-metadata (
> > >     package text,
> > >     key     text,
> > >     value   text,
> > >     PRIMARY KEY (package,key)
> > > );

Hello everybody,

a couple of monthes later, I am able to provide the data in either format.

My short-term goal is to transfer to the UDD bibliographic information from the
blends ‘tasks’ files, that are used to generate the ‘web sentinels’, via the
VCS-managed source packages and the upstream-metadata.debian.net aggregator.

See for instance the following URLs. The first is the ‘tasks’ file. Near its
beginning, the bibliographic reference for the package treeviewx is given. It
is used as a template to generate the ‘Please cite:’ information in the second
URL, the ‘web sentinel’. Now I have copied (and expanded) it in the treeviewx
source package itself, which is in the Debian Med subversion repository (third
URL).  This information is refreshed by the upstream-metadata.debian.net
aggregator each time it is specifically queried for the treeviewx package (a
cache of one minute prevents accidental server harrassment). For example, the
fourth URL retrieves the title of the publication about treeviewx. A special
URL retrieves all the reference titles that are known in the system (fifth
URL).

http://svn.debian.org/viewsvn/blends/projects/med/trunk/debian-med/tasks/bio?revision=2042&view=markup
http://debian-med.alioth.debian.org/tasks/bio#treeviewx
http://svn.debian.org/viewsvn/debian-med/trunk/packages/treeviewx/trunk/debian/upstream-metadata.yaml?revision=4415&view=markup
http://upstream-metadata.debian.net/treeviewx/Reference-Title
http://upstream-metadata.debian.net/table/Reference-Title

Currently the web sentinels are using five informations: the PubMed ID, the
digital object identifier (DOI), the year of publication, the authors and the
title of the article where the packged software is described. Therefore,
concatenating the ouptut of the following five URLs can provide a table
which structure is the same in the second SQL above:

http://upstream-metadata.debian.net/table/DOI
http://upstream-metadata.debian.net/table/PMID
http://upstream-metadata.debian.net/table/Reference-Year
http://upstream-metadata.debian.net/table/Reference-Title
http://upstream-metadata.debian.net/table/Reference-Author

I have not found a nice simple UNIX tool to convert this to a muliti-column
table suitable for the first SQL example. Nevertheless, it can be doe easily
with R (see http://upstream-metadata.debian.net/scripts/umeta-reshape.r).

So, which format would be preferable, the wide one (first SQL example), or the
long one (second SQL example?). For the moment, I have placed two files in both
formats at the following URLs:

http://upstream-metadata.debian.net/for_UDD/biblio.long
http://upstream-metadata.debian.net/for_UDD/biblio.wide

They are refreshed daily by a simple script
(http://upstream-metadata.debian.net/scripts/biblio-for-UDD.sh).


Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


Reply to: