[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Automated copyright reviews using REUSE/SPDX as alternative to DEP-5



On Thu, Jan 27, 2022 at 11:27:45AM +0100, Stephan Lachnit wrote:
> On Thu, Jan 27, 2022 at 12:39 AM Phil Morrell <debian@emorrp1.name> wrote:
> >
> > TLDR: I think REUSE.software is a bad idea that is worse than what
> > Debian already invented with Machine-readable debian/copyright file. I
> > guess if upstream uses it, there's no reason not to ignore that as a
> > source of copyright assertions.
> 
> I expected some concerns about the complexity of the SPDX document,
> but certainly not about standardized copyright information in source
> files.
>
> Yes, Debian may have invented the machine-readable copyright bill, but
> not machine-readable copyright information in source files.

Erm, no that's not what I'm saying? I'll requote my agreement with 

> > I *am* a big fan of SPDX-License-Identifier

I will admit I'm somewhat skeptical in how often file-level copies
happen these days, rather than folder-level or whole project forks. But
it's easy enough to convince people to slap a single-line license
comment in to avoid ambiguity.

> what REUSE is all about, and it greatly reduces manual labor - I don't
> understand how this can be seen as bad.

Because being REUSE-compliant IMO greatly *increases* manual labor as
soon as you're dealing with non-text forms, multiple authors and
aggregation of differing copyright assertions. These are all things that
the debian copyright-format has already solved without (as much) manual
busywork, so if upstream is agreeable to formally documenting their
copyrights, I'd much rather they just used that format in LICENSE.

> > Firstly, I didn't think it was called DEP-5 anymore - it was accepted
> > into policy in 2012 as "copyright-format" titled "Machine-readable
> > debian/copyright file", so no longer a proposal for enhancement. This
> > would be a minor pedantic point (a colloquialism) except for the fact
> > that REUSE encourages it as part of their interface: `.reuse/dep5`.
> 
> Yes it is called "Machine-readable debian/copyright file Version 1.0",
> but everybody knows it _is_ DEP-5, it is even in the spec in the
> second sentence of the abstract.

Sure, and that's fine as a colloquialism, but you haven't addressed my
objection to REUSE formalising that as part of the spec.

> The spec _is_ still DEP-5, being accepted doesn't change that.

Sure it does, compare `#files-field` in both specs, from 2019 policy
upgrading checklist 4.4.1. Perhaps that change should have bumped a
version number, but it's a bit late now.

> > I think this undermines your previous point about it being less prone to
> > failure - if we could trust upstream assertions on copyright, the NEW
> > review wouldn't be a problem in the first place.
> 
> I strongly disagree. First of all, upstream knows way better where
> they copy the code from than packagers do.
> ...
> And as a second point, if you write a debian/copyright, you are most
> likely to trust what is in the header, and I suspect the copyright
> review in NEW is not different from this regard. I mean how can one
> even know if the copyright information is wrong?
> Yes there are cases where copyright information is missing and one can
> try to search it, I've done this not just once, but if a project uses
> REUSE headers, this doesn't happen.

That has not been my experience for projects without a long history,
they tend to not care about copyright initially, slap a generic
assertion on it at some point, but without noticing they've included
e.g. an embedded copy of zlib or less formally - used an image with a
vague gratis use but omitting a formal license.

It's only either later, or from the ITP scrutiny that some confusion
over pedigree comes to light, someone fires off an email to an early
contributor and gets the accurate license information. Or for Debian,
the path gets added to Files-Excluded and patched out of compilation.

> And projects that use REUSE
> are more likely to write that somewhere down as your average NPM
> package that puts a "under MIT license" in the readme and copies
> minified code from everywhere.

Sure, but instead of wasting my time encouraging upstream to become
REUSE-compliant, I would much rather promote a better standard like
Debian's. I was curious and found approximately 40 instances of REUSE in
codesearch, but multiple thousands of the (optional) copyright-format.

Attachment: signature.asc
Description: PGP signature


Reply to: