[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Validating debian/upstream/metadata for debian-med projects



On Thu, Feb 11, 2021 at 03:16:51PM +0200, Andrius Merkys wrote:
> Hello,
> 
> Recent thread on debian-science@ [1] motivated me to look deeper into
> enforcing quality standards of debian/upstream/metadata files (a.k.a.
> DEP 12) we ship with Debian packages. I learnt that lintian already runs
> YAML syntax check on debian/upstream/metadata files, but further
> validation is not performed (to my knowledge). Thus I have developed a
> formal validation tool [2] to check the contents inside these YAML
> files, mostly syntax of URLs and some fields that are defined to be in
> correspondence to BibTeX as per [3].
> 
> Yesterday I have downloaded debian/upstream/metadata files from all
> >1300 projects under https://salsa.debian.org/debian-med/ and run
> against my validator. Resulting validation messages could be grouped
> into the following categories:
> 
> 1. Highly possible typos: reference year '200' (bagpipe), '20015'
> (rambo-k), URLs with spaces (bio-tradis) and so on. This category is the
> one I was actually aiming at.

That's absolutely cool!  Thanks a lot for this!  I think that should be
of catergory lintian error.
 
> 2. URLs with trailing newlines (adapterremoval, aevol, amos, just to
> name a few). This is most likely due to YAML property to append newline
> to the end of multiline strings, which can be quite easily averted [4].
> On the other hand, trailing newlines in URLs could be ignored at all, as
> clearly they are not intentional.

That's helpful as well.  I'd love to see this as lintian warning.
 
> 3. Numeric months in references (augustus, cluster3, haploview, just to
> name a few). According to [3], "[Reference] keys that correspond to
> standard BibTeX entries must provide the same content", and 1988 BibTeX
> manual from CTAN [5] says "[month:] You should use the standard
> three-letter abbreviation". Of course "should" is not "must" (in terms
> of RFC 2119), but machine-reading would be easier with a consistent
> definition.

Interesting detail.  I admit I do not mind a lot about this - but if it
is specified that way it is correct to mention it in the lintian check.
I'm not sure whether this should be 'info' or 'pedantic'.  Feel free to
decide yourself.

> 4. E-mail addresses in Bug-Submit (htslib, last-align, nanook, just to
> name a few). Per [3], values of Bug-Submit are URLs. Maybe [3] could be
> amended to cover e-mails too?

Its sensible to permit e-mails here since this is something where some
bugs need to be submitted.  May be enforcing mailto:e@mail makes a proper
URL?
 
> 5. Unclear scalar/list status of some fields. Only Screenshots is
> defined as "One or more URLs", while in reality lists appear for
> Webservice (clustalw, primer3), Bug-Submit (mira, albeit seems broken).
> Maybe these too could be defined as "One or more URLs"?

I have not thought about this but if there are obvious use cases for
lists it seems to be sensible to permit this.

> 6. Empty templates (agat, intake, libpll-2, just to name a few). I would
> suggest removing the templates, as they do not carry anything meaningful.

That's at least worth a warning - may be even an error.
 
> 7. DOIs written as URLs (fast, libnewuoa). This is debatable, and [5]
> does not talk about DOIs at all.

DOI is specified[6] and should not be an URL (I've just fixed libnewuoa
once I was checking it ... but leave fast to keep some "example" for
testing for you ;-) )
 
> As said earlier, I would be interested in implementing formal validation
> of debian/upstream/metadata in lintian to catch typos and so on.
> However, there are a few ambiguities in the specification, which would
> be really interesting to discuss and resolve.
> 
> Please do not take any part of my text as a critique for anyone. Package
> names are here only for the purpose of illustration.

Your work (including critique as far as it concerns me) is perfectly
welcome and absolutely needed.  I can't count any more how often I
needed to adapt the UDD gatherer for upstream metadata to be tolerant
against different kind of syntax issues.

The lintian check should also verify typos in field names.  Only
those fields that are specified[3] are permitted.

Thanks again

     Andreas.
 
> [1] https://lists.debian.org/debian-science/2021/01/msg00050.html
> [2] https://github.com/merkys/Debian-DEP12, no stable release yet
> [3] https://wiki.debian.org/UpstreamMetadata
> [4] https://yaml-multiline.info/
> [5]
> https://mirror.datacenter.by/pub/mirrors/CTAN/biblio/bibtex/base/btxdoc.pdf

[6] https://en.wikipedia.org/wiki/Digital_object_identifier 

-- 
http://fam-tille.de


Reply to: