[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Validating debian/upstream/metadata for debian-med projects



Very nice, thanks!

Steffen

Am 11.02.21 um 14:52 schrieb Andreas Tille:
> On Thu, Feb 11, 2021 at 03:16:51PM +0200, Andrius Merkys wrote:
>> Hello,
>>
>> Recent thread on debian-science@ [1] motivated me to look deeper into
>> enforcing quality standards of debian/upstream/metadata files (a.k.a.
>> DEP 12) we ship with Debian packages. I learnt that lintian already runs
>> YAML syntax check on debian/upstream/metadata files, but further
>> validation is not performed (to my knowledge). Thus I have developed a
>> formal validation tool [2] to check the contents inside these YAML
>> files, mostly syntax of URLs and some fields that are defined to be in
>> correspondence to BibTeX as per [3].
>>
>> Yesterday I have downloaded debian/upstream/metadata files from all
>>> 1300 projects under https://salsa.debian.org/debian-med/ and run
>> against my validator. Resulting validation messages could be grouped
>> into the following categories:
>>
>> 1. Highly possible typos: reference year '200' (bagpipe), '20015'
>> (rambo-k), URLs with spaces (bio-tradis) and so on. This category is the
>> one I was actually aiming at.
> That's absolutely cool!  Thanks a lot for this!  I think that should be
> of catergory lintian error.
>
>> 2. URLs with trailing newlines (adapterremoval, aevol, amos, just to
>> name a few). This is most likely due to YAML property to append newline
>> to the end of multiline strings, which can be quite easily averted [4].
>> On the other hand, trailing newlines in URLs could be ignored at all, as
>> clearly they are not intentional.
> That's helpful as well.  I'd love to see this as lintian warning.
>
>> 3. Numeric months in references (augustus, cluster3, haploview, just to
>> name a few). According to [3], "[Reference] keys that correspond to
>> standard BibTeX entries must provide the same content", and 1988 BibTeX
>> manual from CTAN [5] says "[month:] You should use the standard
>> three-letter abbreviation". Of course "should" is not "must" (in terms
>> of RFC 2119), but machine-reading would be easier with a consistent
>> definition.
> Interesting detail.  I admit I do not mind a lot about this - but if it
> is specified that way it is correct to mention it in the lintian check.
> I'm not sure whether this should be 'info' or 'pedantic'.  Feel free to
> decide yourself.
>
>> 4. E-mail addresses in Bug-Submit (htslib, last-align, nanook, just to
>> name a few). Per [3], values of Bug-Submit are URLs. Maybe [3] could be
>> amended to cover e-mails too?
> Its sensible to permit e-mails here since this is something where some
> bugs need to be submitted.  May be enforcing mailto:e@mail makes a proper
> URL?
>
>> 5. Unclear scalar/list status of some fields. Only Screenshots is
>> defined as "One or more URLs", while in reality lists appear for
>> Webservice (clustalw, primer3), Bug-Submit (mira, albeit seems broken).
>> Maybe these too could be defined as "One or more URLs"?
> I have not thought about this but if there are obvious use cases for
> lists it seems to be sensible to permit this.
>
>> 6. Empty templates (agat, intake, libpll-2, just to name a few). I would
>> suggest removing the templates, as they do not carry anything meaningful.
> That's at least worth a warning - may be even an error.
>
>> 7. DOIs written as URLs (fast, libnewuoa). This is debatable, and [5]
>> does not talk about DOIs at all.
> DOI is specified[6] and should not be an URL (I've just fixed libnewuoa
> once I was checking it ... but leave fast to keep some "example" for
> testing for you ;-) )
>
>> As said earlier, I would be interested in implementing formal validation
>> of debian/upstream/metadata in lintian to catch typos and so on.
>> However, there are a few ambiguities in the specification, which would
>> be really interesting to discuss and resolve.
>>
>> Please do not take any part of my text as a critique for anyone. Package
>> names are here only for the purpose of illustration.
> Your work (including critique as far as it concerns me) is perfectly
> welcome and absolutely needed.  I can't count any more how often I
> needed to adapt the UDD gatherer for upstream metadata to be tolerant
> against different kind of syntax issues.
>
> The lintian check should also verify typos in field names.  Only
> those fields that are specified[3] are permitted.
>
> Thanks again
>
>      Andreas.
>
>> [1] https://lists.debian.org/debian-science/2021/01/msg00050.html
>> [2] https://github.com/merkys/Debian-DEP12, no stable release yet
>> [3] https://wiki.debian.org/UpstreamMetadata
>> [4] https://yaml-multiline.info/
>> [5]
>> https://mirror.datacenter.by/pub/mirrors/CTAN/biblio/bibtex/base/btxdoc.pdf
> [6] https://en.wikipedia.org/wiki/Digital_object_identifier
>


Reply to: