Re: making debian/copyright machine-interpretable
On Sat, Aug 04, 2007, Joey Hess wrote:
> * Others have mentioned the ordering problem that puts the main license
> last. Seems that Packaging-Copyright at the top is another case of
> this problem (see you've now removed that special case name, but the
> debian/* data would still go there). "Most specific matching glob wins"
> might be a better approach. Then the licenses could be ordered with the
> main one first, or in whatever order that makes sense to humans, and
> if someone wanted to write a tool to extract a given file's license,
> that could be done too.
ACK. I edited the wiki to reflect your and Zack's view which seems to
be the preferred way.
For the sake of simplicity, I interpreted "most specific" as "matches
the fewer files". It has the drawback of possibly changing with the
contents of the source tree, but I fear that any other interpretation is
going to be ambiguous in some annoying cornercases.
> * Having to munge the license text to fit it in the 822 format is one of
> the uglier bits of this proposal, especially since we don't require
> that license texts be DFSG free..
Any idea on how to fix that? I tend to reformat license texts with
leading "|"s quite often so this didn't really strike me as particularly
ugly. 822 seemed like a safe way to escape a license text; if the dots
are really an issue, one can use U+00A0 NO-BREAK SPACE or U+FEFF ZERO
WIDTH NO-BREAK SPACE or anything crazy like that.
> * It's a shame that the boilerplate about where to find the full text of
> the GPL is still needed at the end of the file. One way to avoid this
> might be to use:
> License: /usr/share/common-licenses/GPL-2
> The info about which versions apply would need to be expressed some
> other way though.
ACK. How about parentheses?:
License: GPLv2+ (/usr/share/common-licenses/GPL-2) | MPL | LGPLv2.1 (/usr/share/common-licenses/LGPL-2.1)
The drawback is that the lines can now become very long, and wrapping
them means it's no longer possible to say "first line is license list,
the rest is freeform text". But we can live with long lines, I guess.
> * I don't see much benefit in putting freeform text at the top of the
> file. Keeping it all at the bottom would simplify parsing/validating.
I tend to agree with you. The first version used to be like this,
and I got many suggestions for adding freeform text at the beginning
in order to keep the file human-readable so I switched to allowing it
everywhere. I'll wait for more comments on why it may be useful, but the
paragraph below about tarball origin seems already a valid use case.
> * Makes even more clear that debian/copyright is not the best place for
> Source URLs. They rather stick out from the other data, and this would
> be a great time to go ahead and move them to the control file.
> Dropping them entirely in favour of watch files -- not so good: It's
> good to know where a package came from even if a tarball can't be
> auto-extracted from there by uscan.
But we need freeform text to express how we got the source. When a
URL is available, it's all right. But you cannot express "tarball done
from branch *** of SVN repository ***, stripped from non-free GFDL files
*** and from patented algorithm in file ***, then bootstrapped using
automake version *** " with a URL.
Which doesn't mean we shouldn't have a link to the URL in
debian/control if applicable, of course. Just that it can be done in
a separate process and doesn't necessarily involve removing it from