[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Automated copyright reviews using REUSE/SPDX as alternative to DEP-5



On Fri, Jan 28, 2022 at 9:42 AM Phil Morrell <debian@emorrp1.name> wrote:
>
> On Thu, Jan 27, 2022 at 11:27:45AM +0100, Stephan Lachnit wrote:
> > On Thu, Jan 27, 2022 at 12:39 AM Phil Morrell <debian@emorrp1.name> wrote:
> > >
> > > TLDR: I think REUSE.software is a bad idea that is worse than what
> > > Debian already invented with Machine-readable debian/copyright file. I
> > > guess if upstream uses it, there's no reason not to ignore that as a
> > > source of copyright assertions.
> >
> > I expected some concerns about the complexity of the SPDX document,
> > but certainly not about standardized copyright information in source
> > files.
> >
> > Yes, Debian may have invented the machine-readable copyright bill, but
> > not machine-readable copyright information in source files.
>
> Erm, no that's not what I'm saying? I'll requote my agreement with
>
> > > I *am* a big fan of SPDX-License-Identifier

Yes I saw that line, but you also wrote
> TLDR: I think REUSE.software is a bad idea

I apologize  for the misunderstanding. Maybe next time write something
like "While I am a big fan of copyright information in source files, I
find certain aspects of REUSE bad".
Because there may be valid concerns about the spec, however this is
not really relevant for my proposal: It's mainly about allowing a
different copyright format than [DEP-5 style], which _can_ be created
automatically via REUSE.

> I will admit I'm somewhat skeptical in how often file-level copies
> happen these days, rather than folder-level or whole project forks. But
> it's easy enough to convince people to slap a single-line license
> comment in to avoid ambiguity.

Obviously we as Debian are not a big fan of file-level copies anyways,
but let's just say that REUSE wasn't written just for Debian. There
are enough industry projects that use tons of imported code whether we
like it or not, but it's certainly better with standardized copyright
information than without.

> > what REUSE is all about, and it greatly reduces manual labor - I don't
> > understand how this can be seen as bad.
>
> Because being REUSE-compliant IMO greatly *increases* manual labor as
> soon as you're dealing with non-text forms, multiple authors and
> aggregation of differing copyright assertions. These are all things that
> the debian copyright-format has already solved without (as much) manual
> busywork, so if upstream is agreeable to formally documenting their
> copyrights, I'd much rather they just used that format in LICENSE.

But it does not increase the manual labor for us! It actually
decreases our work, that's what this is all about!

The main point of my proposal: we, as package maintainers, don't have
to do the bulk work anymore, upstream does it. We can just use this
information which we would have done by hand otherwise. This is not
about pushing REUSE to upstream projects from our side at all, but
rather using it downstream to decrease manual labor if it already
exists upstream.

> > Yes it is called "Machine-readable debian/copyright file Version 1.0",
> > but everybody knows it _is_ DEP-5, it is even in the spec in the
> > second sentence of the abstract.
>
> Sure, and that's fine as a colloquialism, but you haven't addressed my
> objection to REUSE formalising that as part of the spec.

If you look at [1]:
> Definitions
> [...]
> DEP5 — Machine-readable debian/copyright file, Version 1.0. Where the REUSE Specification and DEP5 state different things, the REUSE Specification takes precedence. Specifically in the case of the Copyright and License tags.

And they link to the proper spec, so it is nothing but an abbreviation.

> > The spec _is_ still DEP-5, being accepted doesn't change that.
>
> Sure it does, compare `#files-field` in both specs, from 2019 policy
> upgrading checklist 4.4.1. Perhaps that change should have bumped a
> version number, but it's a bit late now.

Oh, thanks, I didn't know that!

> That has not been my experience for projects without a long history,
> they tend to not care about copyright initially, slap a generic
> assertion on it at some point, but without noticing they've included
> e.g. an embedded copy of zlib or less formally - used an image with a
> vague gratis use but omitting a formal license.
>
> It's only either later, or from the ITP scrutiny that some confusion
> over pedigree comes to light, someone fires off an email to an early
> contributor and gets the accurate license information. Or for Debian,
> the path gets added to Files-Excluded and patched out of compilation.

Yes, surely copyright assertion mistakes happen from time to time. But
these can happen everywhere, whether they slap a generic assertion on
it or not. Just using the information REUSE provides doesn't mean that
the code is free from any review, just from the tedious copyright
review. If one detects an embedded copy of zlib, or really any other
embedded code, this needs to be addressed anyway. Detecting these has
nothing to do with any automated copyright review tools, but rather if
a maintainer can actually detect the code.

Maybe I should clarify what I mean by automated: I want to automate
the process of creating and updating d/copyright, as well as the
review in NEW.
I consider making sure that the source code actually uses REUSE
correctly still a duty of the maintainer. If you think we can't trust
our maintainers enough, I'm open to discuss the idea that new source
packages still need a manual copyright review in NEW, which would be
mostly equivalent to the current situation except that updates to
debian/copyright can be enforced even if there is no new binary (which
currently is not the case btw).

> > And projects that use REUSE
> > are more likely to write that somewhere down as your average NPM
> > package that puts a "under MIT license" in the readme and copies
> > minified code from everywhere.
>
> Sure, but instead of wasting my time encouraging upstream to become
> REUSE-compliant, I would much rather promote a better standard like
> Debian's. I was curious and found approximately 40 instances of REUSE in
> codesearch, but multiple thousands of the (optional) copyright-format.

First of all I don't want to force any maintainer to promote REUSE
upstream - this is entirely up to the individual. I also don't want to
force maintainers to use REUSE if supported upstream, if they want
they could still do it with [DEP-5]. Again, this proposal is for an
*alternative* to the current way. I don't see the need to depreciate
our current system.

Second of all - feel free to promote Debian's spec! I won't stop or
even discourage you or anyone else to implement it - any standard is
better than the "mess" we have right now. Even REUSE was influenced by
it. But the [DEP-5] spec is clearly designed to be included in
Debian's packaging, and less so for outside use.

Quote from the spec [2]:
> Establishes a standard, machine-readable format for debian/copyright files within Debian packages

Imagine an upstream project using [DEP-5]. Afaik there is no tool that
can merge it with the copyright information for the packaging in the
debian/ folder, so that we can actually use it for automation of
upstream copyright information. Yes one can copy and paste, which is
better than nothing, but worse than what REUSE aims to offer. Clearly,
the specs have a different scope: REUSE for upstream use, [DEP-5] for
downstream use.


To stress it again: This is *not* about deprecating the current
[DEP-5] spec, whether REUSE is a good spec, if there could be a better
spec or if upstream projects should use it. This *is* about using the
information REUSE provides if supported upstream as a way to
automatically create copyright information. In particular, the
underlying question is if we want to allow the *SPDX* standard (not
REUSE) as an alternative way for developers to provide
debian/copyright. REUSE naturally appears in this process as it allows
to automatically create such SPDX documents, but REUSE itself has no
direct impact on usage of the SPDX spec in debian/copyright.

Regards,
Stephan


[1] https://reuse.software/spec/
[2] https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/


Reply to: