[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Using BibRef from upstream-metadata.yaml (Was: Multiple publication data in upstream-metadata.yaml)



Hi Andreas,

Sorry, the data was actually rotten for multiple reasons.  First, the machine
running upstream-metadata.debian.net stopped keeping dep-src entries in its
sources.list, so debcheckout was not working anymore, and my rudimentary
scripts did not catch the error.  I added error-catching to the TODO list.
Second, when packages change their repository URL, which is not supposed to
happen often, they have to be refreshed by hand.  Third, I hardcoded the
erroneous git url git://git.debian.org/git that we now correct in
git://git.debian.org/.

I have reloaded the data from scratch, by deleting the database and
running the following command for each package med-bio depends on.

  curl http://upstream-metadata.debian.net/$package/YAML-URL

I am now injecting all the fields related to bibliography.  By the way, I
regret that I have put PMID and DOI outside the Reference-* namespace.
Would you mind if I correct this ?

The file used for injection, http://upstream-metadata.debian.net/for_UDD/biblio.yaml,
is valid YAML; this is why I managed to write the loader.  It
is a serie of records, which all contain an array of three fields.
Altogether, they are loaded as a table of three columns.

upstream-metadata.debian.net stores its data in a Berkeley database, where the
field names are the concatenation of the package name and the
upstream-metadata.yaml field name, that is, if in the perlprimer package, there
is “PMID: 15073005”, the Berkeley DB will contain “15073005” for the field
“perlprimer:PMID”.  In the whole information chain, the structure is always
‘package - field - value’.

I do not know where the perlprimer duplicaion came.  Perhaps there was an
invisible character somewhere ?  On the server side, there is a command line
tool to manipulate field values directly, I may have done a typo when making
tests.  This said, I agree that the output should be sanitized.  Also, I
definitely agree to use PRIMARY KEY (package,key) as an extra safety net.
Should it be added to udd/sql/bibref.sql ?

Cheers,

-- 
Charles


Reply to: