[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: Re: cme dpkg-copyright experiences



Hello Stuart

Sorry for this late reply. I've cc'ed debian-perl team as they often have good ideas regarding cme dpkg evolution.

On Tuesday, 1 May 2018 08:39:25 CEST you wrote:
> Thank you for your work on cme and on the update dpkg-copyright module. I've
> looked at it on your blog with interested... I've experimented with it a
> couple of times now and I'm hoping that it gives me a better way of
> maintaining the d/copyright files.

I certainly hope so. I've recently updated the copyright information of 
libuv1 without much trouble.

> I had some feedback that I thought I should offer back to help improve dpkg-
> copyright.

By all means, you're most welcome :-)

> I'll need your guidance in how to properly formulate these into
> useful bug reports, I think. My thoughts are tiny and a bit amorphous ,
> perhaps not being particularly actionable at present -- this is mostly
> because I don't understand enough of what is going on with cme.
> Actually, I'll try to provide patches for the POD based on your answers to
> the questions below ;)

:-)

> The below might look like a clueless user ranting but it's intended to be
> the opposite! dpkg-copyright is an awesome tool and only testing by people
> who aren't familiar with its internals will it get battle hardened and see
> broader usage.

Heart-fully agreed

> # What is scanned?
> 
> https://manpages.debian.org/unstable/libconfig-model-dpkg-perl/
> Dpkg::Copyright::Scanner.3pm.en.html
> 
> Talks of 'patterns':
> - Pattern isn't defined: are they globs or regexps?

These are Perl regexp patterns

> - Are the regexps rooted or do they match anywhere in the file path?

They match anywhere. '^' or '$' must be used to create a regexp that 
matches only the beginning or the end of a path.

> - Is the whole file path (including the builddir) fed to the regexp or only
> from the base of the source package, in which case is the leading / included
> or not?

the relative path from where the command is run. E.g.:
./lib/Config/Model/models/Systemd/Section/Unit.pl

Actually, scan-copyright or cme parses the output of licensecheck :
https://salsa.debian.org/perl-team/modules/packages/libconfig-model-dpkg-perl/blob/master/lib/Dpkg/Copyright/Scanner.pm#L199

> # Filling the blanks
> 
> The same questions about patterns and filename matching apply except for the
> anchoring which is explicitly defined. (the example of ".*/NOTICE" hinted
> me at them being regexs.)

They are also Perl regexp pattern. This time, they match only from the 
beginning of the path. i.e. a '^' is added to the beginning of the pattern. 
Unfortunately, it did not occur to me that this is not consistent with the 
previous usage of regexp patterns.

Speaking of patterns.. While using the word pattern is technically more 
exact, people mostly know about regexp. May be regexp word should 
be used instead of pattern.

> «See the "/src" dir in the example below» should be "src/".

yes

> Multiple matches: «the data attached to more specific path (e.g. "3rdparty/
> foo/blah.c") are applied before more generic patterns (e.g. "3rdparty/foo/"
> » does that mean that the the generic info will override the specific info?
> Or once a file matches a rule, does matching stop for that file?

The latter. A more specific pattern trumps the generic one, whatever the order 
they are declared in the yaml file. This explains why the pattern are anchored 
to the beginning of the path. Sorting pattern not matching from the beginning 
of the path would not make sense.

> `license` key: «must contain a license short name as returned by
> "license_check"» What is license_check? Cross-ref? 

Its licensecheck command:
https://manpages.debian.org/unstable/licensecheck/licensecheck.1p.en.html

Now that --deb-fmt option is used with licensecheck, the license declared in 
fill.copyright.blanks.yml should be a SPDX license name.

> Is that a key like is
> used in DEP-5? Is it allowed to be a non-simple key like "GPL-2 or
> LGPL-3+"?

I should have written "license value" instead of "license key". Anyway, the 
value should be a SPDX license name, or a license that is declared in the 
Stand-alone license paragraph.
see https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#stand-alone-license-paragraph

> A note on yaml syntax not needing quotes on the values most of the time
> would be nice. A hint that you can do multiline entries successfully with
> 
> 	copyright: |-
> 		2001, foo bar
> 		2002, quux
> 
> would be good too.

Sure. Although, the exact syntax may be modified if 'cme edit dpkg' is used to 
edit fill.copyright.blanks.yml

> The patterns in fill.copyright.blanks.yml don't start with / but the ones in
> copyright-scan-patterns.yml; is that intentional?

Uh... In copyright-scan-patterns.yml, '/' must be used if you want to match a 
directory name and not part of a directory name.

This is designed to feed check and ignore patterns to  --check and --ignore options of licensecheck.
This program works with path like './t/foo.t' so a pattern excluding t directory should be '^/t/'

Due #842368, licensecheck --check option does not work, and I don't think Jonas wants to fix that.
 licensecheck now scans all files except the one that are explicitly ignored. 

So the check part is done in scan-copyright where the leading './' was removed. Fortunately, check 
pattern is mostly about file suffix.

This is quite a mess. But I don't know what can be done to improve it.
 
> If cme is going to warn or error about config, could it please include the
> line numbers of where that config option is? I ended up having to do binary
> searches by deleting config entries to find out what it was complaining
> about.

Due to the design of Config::Model (which is generic and can be used with a GUI), this is not so
simple. 

I'll see what I can do. Could you please log a bug against libconfig-model-dpkg-perl so I don't forget about it ?

> Finally on filling the blanks, I either found a bug or I completely
> misunderstand what is supposed to be happening here... In the cantata
> package, I needed to fix up some parsing where a multiline copyright was
> not being captured:
> 
> https://salsa.debian.org/multimedia-team/cantata/blob/master/debian/
> fill.copyright.blanks.yml#L48
> 
> I originally put the Craig Drummond entry first for this entry (and the
> other two) and that caused the entries to disappear from the coalesced
> output. If I changed them to "Craig Drummond A" "… B" and "… C" then these
> three overridden sections appeared OK. Moving Craig to the last entry also
> made them all appear correctly. It feels like the coalescence phase is
> disappearing these entries into the overall package copyright which is
> Craig with those same years.

The entry will disappear if they are identical to the entry of the upper directory.

I.e all entries matching:
mpd-interface/(mpdc|mpdp|mpds|output|playlist|song).*:
    override-copyright: |-
        2008, Sander Knopper (sander AT knopper DOT tk)
        Roeland Douma (roeland AT rullzer DOT com)
        2011-2018, Craig Drummond <craig.p.drummond@gmail.com>
    license: GPL-2+

disappear if mpd-interface/* has the same copyright and license:

In your case, there may be a bug because the copyright is different:
(cme does not infer that sander@knopper.tk is identical as sander AT knopper DOT tk):

Files: mpd-interface/*
Copyright: 2008, Sander Knopper (sander@knopper.tk)
 Roeland Douma (roeland@rullzer.com)
 2011-2018, Craig Drummond <craig.p.drummond@gmail.com>
License: GPL-2+


> # Tweaking results
> 
> https://manpages.debian.org/unstable/libconfig-model-dpkg-perl/
> Config::Model::Dpkg::Copyright.3pm.en.html
> 
> The docs for fix.copyright are pretty hard to follow. I still don't
> understand the format documentation even having used it on a couple of
> packages.

The format is explained in details in Config::Model::Loader:
http://search.cpan.org/perldoc?Config%3A%3AModel%3A%3ALoader

The pod contains a link, but this is lost in translation.

> (the line wrapping on the html version is very unhelpful here and even in an
> 80 char terminal it's painful to read the examples - can the POD be tweaked
> in some way to make that better?)

yes. This part is a code section of the pod. It can be modified to have all lines shorter than 80 chars

> - «Lines breaks are ignored.» Does that mean this entire file is treated as
> one line? Or that you can put a line break inside " … " and it works?

The entire file is treated as one line. A line break inside quote will be passed verbatim to the 
config parameter. Which may break if this config parameter is a uniline.

E.g.

 Maintainer="Dom Dumont
 <dod@debian.org>"

is correct syntically correct but will break because Maintainer value cannot contain a line break.

> - the examples start with "! copyright" but if you use that you get a
> warning back from cme asking you if you misspelled Copyright. Has the
> format changed?

Ouch. 'copyright' is the parameter name used to access copyright data when running cme on dpkg.

The example will work with 'cme update dpkg' but will break with 'cme update dpkg-copyright'

This is confusing. "fix.copyright" file is intended for copyright and should work the same way whether they are
run with cme dpkg or cme dpkg-copyright. Please log a bug.

> - some items use : while others use = . Is there a difference?

':' is for a key of a hash or an index of a list. '=' is used to assign a value.

The example  '! copyright License:MIT' is confusing because it creates a MIT entry in License hash. The text part is filled
automatically by Config::Model. Without the automatic part, the instruction would be :

 ! copyright License:MIT text="blah blah
MIT license blah
blah blah"

> - There's a space between License and short_name?

yes. short_name is an attribute of License. This becomes clearer once you use the GUI (cme edit dpkg-copyright).
The GUI will show the actual structure of the copyrgith data stored in Config::Model.

> - I assume ~ is meaning to interpret the next bit of data as a regexp?

yes

> - if it's a regexp, are the // required? 

yes. Otherwise, the parser cannot distinguish File:~foo (delete the 'foo' file) 
from File:~/foo/ (delete any file entry that match /foo/)

> (why are they required here and not
> in the other files?)

The other files are used by the dpkg specific part of Config::Model. fix.copyright file is 
interpreted by Config::Model::Loader which is generic.

> - I don't think this file is handled in a UTF-8 safe way; 

That's possible.

> I tried to set a
> copyright entry to "Téo" and cme generated decode errors. (A real test case
> is probably needed for this)

Sure. I've some utf8 test cases, but I may have missed this one.

> One final tweak that would make sense for the cantata copyright file is to
> coalesce based on subdirectory a bit more, particularly in the 3rdparty/
> tree. The 3rdparty/* directories are separate projects or widgets designed
> to be copy+pasted into projects (yuck!) but we weirdly end up with a
> copyright entry for "3rdparty/*" for one author who has done a couple of
> different things, and then that gets overridden for the things he did not
> do:
> 
> https://salsa.debian.org/multimedia-team/cantata/blob/master/debian/
> copyright#L18
> 
> It's a minor thing, but if I could forbid dpkg-copyright from generating an
> entry for 3rdparty/* or hint it to not coalesce above 3rdparty/*/ or so, I
> would do so, making a slightly more longer but conceptually simpler to
> understand file

I'm not a fan of creating a special case for a  directory that happens to be named 3rdparty.
I'll think about it. No promise there.

> # Finding the documentation
> 
> I completely understand why POD and pod2man are attractive. For the non-perl
> person, these are impenetrable filenames / man page names and tab
> completion doesn't always work very nicely with them either. I also spent
> ages trying to work out why Config::Model::models::Dpkg::Copyright did not
> appear to the documentation I was looking at only to eventually find that
> there was also Config::Model::Dpkg::Copyright. Is there a way to turn these
> PODs into some good user-facing documentation? (I don't know what perl-land
> does for that).

The site search.cpan.org and metacpan.org create nice documentation from pod files.
I used to ship Config::Model::Dpkg to get the documentation on CPAN, but it was a 
lot of extra work because most tests failed on non Debian system. I finally gave up.

Since gitlab can display nice pod doc:
https://salsa.debian.org/perl-team/modules/packages/libconfig-model-tester-perl/blob/master/README.pod

May be the pod doc could be stored in a separate file along the Perl files so all doc
could be found on Salsa ? (albeit, the doc would no be easy to find).

Or may be, we (debian-perl team) should create a META_CPAN site that would show
only Perl modules available on Debian..

Many thanks for taking the time to improve cme dpkg doc :-)

All the best

Dod




Reply to: