multiple roles of d/copyright
On Tue, 08 Feb 2022 at 08:59:23 -0500, Scott Kitterman wrote:
> From my point of view, treating something like other common classes of RC bugs
> means that the project is producing tools and processes to make detection of
> such bugs more automated to remove them from the archive, that developers are
> actively looking for them, and that they are routinely fixed in the normal
> course of Debian development.
I think part of the problem here might be that copyright information is
"social", not "technical": software authors can claim copyright and/or
authorship in various forms of human-readable, free-form text, which means
any automated detection is necessarily going to be imperfect, and as long
as our policy demands perfection, there will be a reluctance to automate
this (or at least a reluctance to say that we are automating it).
Another part of the problem is that licensing and copyright-information
bugs are not something that we are realistically going to find through
normal use of software: if GTK crashes when you print on a Tuesday, one
of our users will eventually notice, but if we have missed a copyright
holder, it's unlikely that anyone is going to notice that omission from
the list of around 400 potential copyright holders in
<https://tracker.debian.org/media/packages/g/gtk4/copyright-4.6.0ds1-3>
unless they repeat the time-consuming process of collecting possible
copyright claims from the source code (as the ftp team presumably do). I
have no idea how the maintainers of larger and more complicated packages
manage to do this, or how the ftp team manage to review larger and more
complicated packages in a finite time.
I think the copyright file is doing several things which are perhaps in
conflict:
* It lets consumers of packages know what restrictions apply to their
use of a package
- This requires *most* of the license information, although not
necessarily all of it: for example if a package like Linux is licensed
under a mixture of GPL, LGPL, BSD and MIT licenses, it's usually
sufficient to be aware of the most restrictive of those licenses, in
this case GPL
- Having too much information, however, well-intentioned, actually works
against this by making it harder to find what you need
- I would argue that requiring the text of licenses like the CC family
to be inlined into the copyright file works against this goal, by
reducing the signal-to-noise ratio: if you are not familiar with a
particular license, then obviously you will need to read its text
to see what it means, but if you are looking at packages that have
content under various semi-common licenses, you only need to read
each license once
- I would argue that requiring lists of copyright holders in the same
file to be inlined into the copyright file also works against this
goal, again by harming the signal-to-noise ratio
* It lets consumers of packages know that the package is DFSG-compliant
- Same requirements as above
* It's a place to reproduce information that licenses require us to, like
a comprehensive set of copyright notices (if our interpretation of the
applicable licenses is that pointing to nearby source code and calling
it extremely comprehensive accompanying documentation is insufficient)
- In this role, it's essentially write-only: we're doing this because
we have been required to do it, more than because it's practically
useful, and I don't expect anyone to actually read this, except for
the maintainer when collecting it and the ftp team when verifying
that it has been collected
- In another subthread, Stephan Lachnit suggests using the SPDX format
for this write-only information, which I think might be intended as
a way to eventually separate it from the other roles of d/copyright
* It gives authors due credit (which we are not *required* to do, but
in previous discussions of d/copyright I've seen this cited as a reason
why we *should* do this in order to be good citizens)
- Note that collecting copyright holders is not necessarily actually
helpful here, because that often means we are required to "credit"
an employer, rather than mentioning the actual author
- In a medium-sized package like GTK, it's not clear to me that a list of
about 400 possible copyright holders is actually serving this purpose,
because any individual contributor is lost in the noise
* It lets us meet our self-imposed rules
- This is circular, so I'm inclined to disregard it when discussing what
the rules should be: we should set rules because they help us to
achieve a goal, rather than for the sake of having rules
* It lets the ftp team (or other interested reviewers) duplicate the
info-collecting process to check that all of the above have been done
- This is somewhat circular, because this is a way to support the other
goals, not really a goal in its own right
* Are there other relevant goals that I've missed here?
I don't think conflating those goals and assuming they all need to be
satisfied by a single file is necessarily going to lead to meeting any
of those goals in an efficient way, let alone meeting all of them in
an efficient way.
smcv
Reply to: