[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Automated copyright reviews using REUSE/SPDX as alternative to DEP-5



On Thu, Jan 27, 2022 at 12:39 AM Phil Morrell <debian@emorrp1.name> wrote:
>
> TLDR: I think REUSE.software is a bad idea that is worse than what
> Debian already invented with Machine-readable debian/copyright file. I
> guess if upstream uses it, there's no reason not to ignore that as a
> source of copyright assertions.

I expected some concerns about the complexity of the SPDX document,
but certainly not about standardized copyright information in source
files.

Yes, Debian may have invented the machine-readable copyright bill, but
not machine-readable copyright information in source files. This is
what REUSE is all about, and it greatly reduces manual labor - I don't
understand how this can be seen as bad.

On Thu, Jan 27, 2022 at 12:39 AM Phil Morrell <debian@emorrp1.name> wrote:
>
> I *am* a big fan of SPDX-License-Identifier, but the above being
> straightforward is only true for the most trivial of examples. REUSE
> advocate for sprinkling .license files around your repo for e.g. logos
> and other binaries. Same story with multiple authors, they recommend
> using multiple FileCopyrightText's initially, then split it out to a
> separate AUTHORS file and use something like "Project X contributors".

No, it does not only work for trivial examples. Take any project with
a significant amount of code, e.g. [1], and most of the time you will
find that every source file has the copyright information in the
header. The problem is, there has been no standardized way to parse
them. That's why we have tools like licensecheck that try to find it
out. With REUSE, it gets much much easier.

Wrt to the .license files: yes they're ugly, but still better than no
automation at all. With the new yaml spec, I suspect that these will
go away.
Wrt to multiple authors: this is not the fault of REUSE, but just how
copyright works.

> Ultimately, when everything becomes too much, REUSE falls back to
> recommending Debian's copyright format anyway! So even if upstream sees
> the value in taking some copyright busywork off our hands, why not
> suggest they just use it in the first place in e.g. the LICENSE file.

Sight, yes, because Debian's format is afaik the only standardized,
easy to parse format out there. But the reason why it is there is
*not* for "when everything becomes", but for files that you cannot and
don't want to alter. For example, if you regularly import 3rd-party
code that does not follow REUSE and you don't want to edit the header
all the time. Note that if everyone would use REUSE, that would not be
a problem. Another example is when you have tiny example code or
configs that you want to present to a user, but without any
distracting comments (think beginner tutorials).

However, they want to switch from DEP-5 to a more flexible (i.e.
non-central, relocatable) spec [2]. And there is good reason to do so:
for example we as Debian can specify the copyright information from
our packaging separate from the upstream code, without conflict. DEP-5
does not allow that.

> Firstly, I didn't think it was called DEP-5 anymore - it was accepted
> into policy in 2012 as "copyright-format" titled "Machine-readable
> debian/copyright file", so no longer a proposal for enhancement. This
> would be a minor pedantic point (a colloquialism) except for the fact
> that REUSE encourages it as part of their interface: `.reuse/dep5`.

Yes it is called "Machine-readable debian/copyright file Version 1.0",
but everybody knows it _is_ DEP-5, it is even in the spec in the
second sentence of the abstract. The spec _is_ still DEP-5, being
accepted doesn't change that.

> I think this undermines your previous point about it being less prone to
> failure - if we could trust upstream assertions on copyright, the NEW
> review wouldn't be a problem in the first place.

I strongly disagree. First of all, upstream knows way better where
they copy the code from than packagers do. And projects that use REUSE
are more likely to write that somewhere down as your average NPM
package that puts a "under MIT license" in the readme and copies
minified code from everywhere.
And as a second point, if you write a debian/copyright, you are most
likely to trust what is in the header, and I suspect the copyright
review in NEW is not different from this regard. I mean how can one
even know if the copyright information is wrong?
Yes there are cases where copyright information is missing and one can
try to search it, I've done this not just once, but if a project uses
REUSE headers, this doesn't happen.

Regards,
Stephan

[1] https://gitlab.cern.ch/geant4/geant4
[2] https://github.com/fsfe/reuse-docs/issues/81


Reply to: