[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

multiple roles of d/copyright



On Tue, 08 Feb 2022 at 08:59:23 -0500, Scott Kitterman wrote:
> From my point of view, treating something like other common classes of RC bugs 
> means that the project is producing tools and processes to make detection of 
> such bugs more automated to remove them from the archive, that developers are 
> actively looking for them, and that they are routinely fixed in the normal 
> course of Debian development.

I think part of the problem here might be that copyright information is
"social", not "technical": software authors can claim copyright and/or
authorship in various forms of human-readable, free-form text, which means
any automated detection is necessarily going to be imperfect, and as long
as our policy demands perfection, there will be a reluctance to automate
this (or at least a reluctance to say that we are automating it).

Another part of the problem is that licensing and copyright-information
bugs are not something that we are realistically going to find through
normal use of software: if GTK crashes when you print on a Tuesday, one
of our users will eventually notice, but if we have missed a copyright
holder, it's unlikely that anyone is going to notice that omission from
the list of around 400 potential copyright holders in
<https://tracker.debian.org/media/packages/g/gtk4/copyright-4.6.0ds1-3>
unless they repeat the time-consuming process of collecting possible
copyright claims from the source code (as the ftp team presumably do). I
have no idea how the maintainers of larger and more complicated packages
manage to do this, or how the ftp team manage to review larger and more
complicated packages in a finite time.

I think the copyright file is doing several things which are perhaps in
conflict:

* It lets consumers of packages know what restrictions apply to their
  use of a package
  - This requires *most* of the license information, although not
    necessarily all of it: for example if a package like Linux is licensed
    under a mixture of GPL, LGPL, BSD and MIT licenses, it's usually
    sufficient to be aware of the most restrictive of those licenses, in
    this case GPL
  - Having too much information, however, well-intentioned, actually works
    against this by making it harder to find what you need
  - I would argue that requiring the text of licenses like the CC family
    to be inlined into the copyright file works against this goal, by
    reducing the signal-to-noise ratio: if you are not familiar with a
    particular license, then obviously you will need to read its text
    to see what it means, but if you are looking at packages that have
    content under various semi-common licenses, you only need to read
    each license once
  - I would argue that requiring lists of copyright holders in the same
    file to be inlined into the copyright file also works against this
    goal, again by harming the signal-to-noise ratio

* It lets consumers of packages know that the package is DFSG-compliant
  - Same requirements as above

* It's a place to reproduce information that licenses require us to, like
  a comprehensive set of copyright notices (if our interpretation of the
  applicable licenses is that pointing to nearby source code and calling
  it extremely comprehensive accompanying documentation is insufficient)
  - In this role, it's essentially write-only: we're doing this because
    we have been required to do it, more than because it's practically
    useful, and I don't expect anyone to actually read this, except for
    the maintainer when collecting it and the ftp team when verifying
    that it has been collected
  - In another subthread, Stephan Lachnit suggests using the SPDX format
    for this write-only information, which I think might be intended as
    a way to eventually separate it from the other roles of d/copyright

* It gives authors due credit (which we are not *required* to do, but
  in previous discussions of d/copyright I've seen this cited as a reason
  why we *should* do this in order to be good citizens)
  - Note that collecting copyright holders is not necessarily actually
    helpful here, because that often means we are required to "credit"
    an employer, rather than mentioning the actual author
  - In a medium-sized package like GTK, it's not clear to me that a list of
    about 400 possible copyright holders is actually serving this purpose,
    because any individual contributor is lost in the noise

* It lets us meet our self-imposed rules
  - This is circular, so I'm inclined to disregard it when discussing what
    the rules should be: we should set rules because they help us to
    achieve a goal, rather than for the sake of having rules

* It lets the ftp team (or other interested reviewers) duplicate the
  info-collecting process to check that all of the above have been done
  - This is somewhat circular, because this is a way to support the other
    goals, not really a goal in its own right

* Are there other relevant goals that I've missed here?

I don't think conflating those goals and assuming they all need to be
satisfied by a single file is necessarily going to lead to meeting any
of those goals in an efficient way, let alone meeting all of them in
an efficient way.

    smcv


Reply to: