Re: Bug#885698: What licenses should be included in /usr/share/common-licenses?
Jonas Smedegaard <jonas@jones.dk> writes:
> Strictly speaking it is not (as I was more narrowly focusing on) that
> the current debian/copyright spec leaves room for *ambiguity*, but
> instead that there is a real risk of making mistakes when replacing with
> centrally defined ones (e.g. redefining a local "Expat" from locally
> meaning "MIT-ish legalese as stated in this project" to falsely mean
> "the MIT-ish legalese that SPDX labels MIT").
Right, the existing copyright format defines a few standard labels and
says that you should only use those labels when the license text matches,
but it doesn't stress that "matches" means absolutely word-for-word
identical.  I suspect, although I haven't checked, that we've made at
least a few mistakes where some license text that's basically equivalent
to Expat is labelled as Expat even though the text is not word-for-word
identical.  Given that currently all labels in debian/copyright are
essentially local and the full text is there (except for common-licenses,
where apart from BSD the licenses normally are used verbatim), this is not
currently really a bug.  But we could turn it into a bug quite quickly if
we relied on the license short name to look up the text.
To take an example that I've been trying to get rid of for over a decade,
many of the /usr/share/common-licenses/BSD references currently in the
archive are incorrect.  There are a few cases where the code is literally
copyrighted only by the Regents of the University of California and uses
exactly that license text, but this is not the case for a lot of them.  It
looks like a few people have even tried to say "use common-licenses but
change the name in the license" rather than reproducing the license text,
which I don't believe meets the terms of the license (although it's of
course very unlikely that anyone would sue over it).
A quick code search turns up the following examples, all of which I
believe are wrong:
https://sources.debian.org/src/mrpt/1:2.10.0+ds-3/doc/man-pages/pod/simul-beacons.pod/?hl=35#L35
https://sources.debian.org/src/gridengine/8.1.9+dfsg-11/debian/scripts/init_cluster/?hl=7#L7
https://sources.debian.org/src/rust-hyphenation/0.7.1-1/debian/copyright/?hl=278#L278
https://sources.debian.org/src/nim/1.6.14-1/debian/copyright/?hl=64#L64
https://sources.debian.org/src/yade/2023.02a-2/debian/copyright/?hl=78#L78
An example of one that probably is okay, although ideally we still
wouldn't do this because there are other copyrights in the source:
https://sources.debian.org/src/lpr/1:2008.05.17.3+nmu1/debian/copyright/?hl=15#L15
This problem potentially would happen a lot with the BSD licenses, since
the copyright-format document points to SPDX and SPDX, since it only cares
about labeling legally-equivalent documents, allows the license text to
vary around things like the name of the person you're not supposed to say
endorsed your software while still receiving the same label.
We therefore cannot use solely SPDX as a way of determining whether we can
substitute the text of the license automatically for people, because there
are SPDX labels for a lot of licenses for which we'd need to copy and
paste the exact license text because it varies.  At least if I understand
what our goals would be.
(License texts that have portions that vary between packages they apply to
are a menace and make everything much harder, and I really wish people
would stop using them, but of course the world of software development is
not going to listen to me.)
-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>
Reply to: