Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)

To: debian-med@lists.debian.org
Subject: Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
From: Andreas Tille <andreas@an3as.eu>
Date: Fri, 9 Mar 2012 09:13:19 +0100
Message-id: <[🔎] 20120309081319.GC18868@an3as.eu>
In-reply-to: <[🔎] 20120309003707.GA4747@falafel.plessy.net>
References: <20120218184041.GI20647@an3as.eu> <20120223005555.GC17772@falafel.plessy.net> <20120223074500.GA8824@an3as.eu> <20120223085159.GA18124@falafel.plessy.net> <20120223091352.GE8824@an3as.eu> <20120223150437.GA20070@falafel.plessy.net> <20120223151953.GD15447@an3as.eu> <[🔎] 20120308200316.GA3788@an3as.eu> <[🔎] 20120309000614.GA4478@falafel.plessy.net> <[🔎] 20120309003707.GA4747@falafel.plessy.net>

On Fri, Mar 09, 2012 at 09:37:08AM +0900, Charles Plessy wrote:
> I think that we need to discuss the general syntax of the file.
> 
> Initially, I thought it as limited to "name: value" fields like in Debian control
> data files.  Then after adding many Reference-* fields, I found that syntax
> boring and introduced a trick, that hashes are permitted but that they are
> reduced to simple fields by concatenating the hash name to the field name.
> 
> Foo:
>  Bar: baz
> 
> is therfore the same as:
> 
> Foo-Bar: baz
> 
> I have documented this in the Debian wiki long time ago.
> 
>   http://wiki.debian.org/UpstreamMetadata

You are most probably refering to my recent changes in debian/upstream
files.  I would like to give some reasons for changes:

  1. Typos in field names (keys)
     My importer script verifies a key in the references scope against
     a fixed list which is currently

    defined_fields = { 'article'   : 0,
                       'author'    : 0,
                       'booktitle' : 0,
                       'doi'       : 0,
                       'editor'    : 0,
                       'eprint'    : 0,
                       'issn'      : 0,
                       'journal'   : 0,
                       'license'   : 0,
                       'month'     : 0,
                       'number'    : 0,
                       'pages'     : 0,
                       'publisher' : 0,
                       'pmid'      : 0,
                       'title'     : 0,
                       'url'       : 0,
                       'volume'    : 0,
                       'year'      : 0,
                     }

     these are the currently used keys in debian/upstream files
     (with the exception of 'comment' where I'm not sure about)
     I'm using this list for two purposes:
       1. Make sure a key is not duplicated (thus the number)
       2. Issue a warning about unknown keys which lets me easily
          fetch typos or new keys which need discussion
     That way I was able to fetch some typos

  2. Duplicated keys
     The script ignores a duplicated key but I removed them from
     upstream files - we just should not accept this.

  3. Unified syntax
     Close to all debian/upstream files are using

      Foo:
        Bar: baz

     I detected some few files that used

      Foo-Bar: baz

     and changed these just for esthetic reasons.
     I detected them because the Python yaml importer does NOT
     regard both as equivalent.  There might be some options to
     change this or just work around this programmatically.
     However, it seemed more easy to me to simply change a two
     debian/upstream files to a more esthetic layout (in my eyes)
     rather than changing the script.

> We need to keep in mind that we will already meet a strong resistance to YAML
> (at least in terms of loudness; we know that sometimes it actually represents
> close to nothing), and therefore be carful to not go too far in directions
> that deviate from the name:value paradigm.

I'm not really sure what you mean here.  I was parsing > 70 existing
yaml files successfully.  The only changes I did where making them more
unique - in other words making them more conform which what was the
accepted standard.

> This said, for me it is essential to use YAML.  You can see that each time
> some control data files change their contents, like in the Debian archive,
> things break because they are parsed ad-hoc and each field needs to be
> described on how it is parsed.  YAML totally solves this for the field
> syntax.  And (apart from libyaml-per) it has good parsers in many languages.

Sure.  We use YAML and use YAML parsers.  It seems that the Python
parser behaves a bit different (see the ': ' issue) and also the
"Foo-Bar: baz" stuff seems to behandled a bit different.  I can confirm
that I was able to parse all debian/upstream files in Debian Med
properly and I will soonish run a test on your preparation in SVN.
(BTW, I'm quite happy that I did my test preparation for the script
right in time to have it close to ready once the files are gathered
by you in SVN - thanks again for doing so.)

Kind regards

       Andreas.

-- 
http://fam-tille.de

Reply to:

Follow-Ups:
- Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
  - From: Olivier Sallou <olivier.sallou@irisa.fr>

References:
- Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
  - From: Charles Plessy <plessy@debian.org>
- Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
  - From: Charles Plessy <plessy@debian.org>

Prev by Date: Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
Next by Date: Re: CfP - 13th Libre Software Meeting, Geneva, SWITZERLAND
Previous by thread: Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
Next by thread: Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)
Index(es):
- Date
- Thread