[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically



Le Mon, Feb 20, 2012 at 09:43:09AM +0100, Andreas Tille a écrit :
>  
> there are tools which assemble informations for Sources.gz files - I guess
> this could be implemented if say 20% of the packages will contain such a
> file.

In such a model, the packages need to be uploaded so that Sources.gz is
updated.  This is exactly what I aim at avoiding by feeding the UDD with
Umegaya.

> > This is why I designed a push model.  After updating debian/upstream for the
> > package 'foo', visit http://upstream-metadata.debian.net/foo/YAML-URL, and
> > Umegaya will refresh its information.  (This will work after I transfer the
> > service to debian-med.debian.net; I really hope to do it this evening).
> 
> I admit I do not trust that a developer will really do regular visits to
> http://upstream-metadata.debian.net/foo/YAML-URL or any similar URL.

Note that anybody can trigger a refresh.  For instance, I ran this command to
load all the upstream metadata for the packages known by debcheckout, and that
are recommended by one of our tasks.

for package in $(svn cat svn://svn.debian.org/blends/projects/med/trunk/debian-med/debian/control | grep Recommends | sed -e 's/,//g' -e 's/|//g' -e 's/Recommends://g' ); do curl http://upstream-metadata.debian.net/$package/Name ; done

I can set up a cron job along these lines, in addition to VCS hooks.


> BTW, it came to my mind that we should also gather
> fields from debian/copyright if it is DEP5 compatible.  I specifically
> consider Upstream-Contact a very valuable field and at a later stage I
> would even ask for a lintian check "Upstream-Contact is missing" or
> something like this.

I actually opposed - with no success - the includsion of the Upstream-Contact
and Upstream-Name fields in DEP 5 as they usually do not contribute to respect
the package's redistribution terms, with is the purpose of the Debian copyright
file.

The debian/upstream file features Contact and Name fields that can be used
for the same purpose.

>   1. scripts/fetch_bibref.sh
>      fetches all available debian/upstream files and move them to
>      /org/udd.debian.org/mirrors/upstream/package.upstream
>      I would like to stress the fact that I would fetch these
>      files *unchanged* as they are edited by the author
>   2. udd/bibref_gatherer.py
>      Just parse the upstream files for bibliographic information
>      and push them into UDD
>      This is the really cheap part of the job and I volunteer to
>      do this in one afternoon.

The problem with this approach is that it can only run on udd.debian.org,
which is quite loaded if I understand well.

Regardless the mean, I provide a table that can be downloaded daily and that
can be loaded in the UDD.  That is how the gatherers work as I have seen so
far.  That the data transits in a Berkeley DB is just a detail.  It is as
unimportant as having the data processed with one programming language or
another.  What matters is the final product, the table to be loaded.

> However, regarding practical usage of these data I do not see
> an application currently.  You need a problem first which needs to be
> solved to invent something new.

The goal of the sytem is:

 - Let the maintainer update the data without uploading the package,

 - Gather data for our tasks pages.  In addition to the bibliography,
   I think that, while rare, the Registration and Donation fields
   can be very useful to better cooperate with Upstream.

http://upstream-metadata.debian.net/table/registration
http://upstream-metadata.debian.net/table/donation


>    dh_bibref
> 
> which turns debian/upstream data into a usable BibTeX database on the
> users system.  This is technically definitely not hard - it just needs
> to be *done*.

The challenge will be to have it ran by default by Debhelper.  But
I think that indeed it is the good direction.  In the meantime, such
a tool will need to produce a reference that is stored in the directory.


>   A. Gather *all* existing debian/upstream files and making sure they
>      will be updated after at least 24h at a place where they can be
>      fetched for UDD (I explicitely do not mention that we should do this
>      via the web service and I would really prefer not to go the detour
>      of another database)

Currently I have the following cron job running on debian-med.debian.net:

@hourly for key in DOI PMID Reference-Author Reference-Eprint Reference-Journal Reference-Number Reference-Pages Reference-Title Reference-URL Reference-Volume Reference-Year References; do curl -s http://upstream-metadata.debian.net/yaml/$key; done > public_html/biblio.yaml

Therefore, the bibliographic data can now be accessed at the following URL.

http://upstream-metadata.debian.net/~plessy/biblio.yaml

[You may need to wait a bit for the DNS to propagate the new
IP for upstream-metadata.debian.net]

Let's see how it goes before deciding to redo everyghing from scratch with a
new design.

Cheers,

-- 
Charles


Reply to: