[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Automated copyright reviews using REUSE/SPDX as alternative to DEP-5



Hi Jonas,

On Tue, Feb 8, 2022 at 4:39 PM Jonas Smedegaard <jonas@jones.dk> wrote:
>
> I am sceptical towards this proposal.
>
> An important feature to me with current machine-readable format is that
> really it is machine-and-human-readable.

Thank you for your input! I'm aware of this concern, however I think
it is not something that can't be solved.

For one, while not as trivial to under as the current machine-readable
copyright, it's still "human-readable" (i.e. a tag:value style text
file). I would do the following comparison: if you only know Python
(DEP-5), C++ (SPDX) might look a bit weird, but you can get the gist
of it.

However, I also think the human-readable aspect is less important here
because it is an output format. What I mean with this is that the
information is already there in a human readable way: either via REUSE
or in the file headers directly. While it is theoretically possible to
write SPDX documents by hand, I would not treat them with the same
trust as one created by REUSE.

> Another important feature to me is that there is only one format (in
> addition to unformatted content, which hopefully we can put past us at
> some point).
>
> Today, I can as DD help proof-read and change *any* package in Debian.

Regarding reviews: I plan to write a SPDX-to-DEP5 converter anyway to
get a better feel for the spec. I will probably also write a copyright
review tool that will show you the copyright header of each file based
on DEP5 or SPDX information for validation / manual review. This will
make proof-reading copyright information much easier.

But to stress this again: the goal is to *replace* the manual
copyright reviews by something much better: automatic copyright
reviews. There are three areas of interest for copyright information:
a) for developers writing it b) for the user receiving it and c) the
legal side.

Regarding a: From hand DEP5 is better, but for automation SPDX is equally good.
Regarding b: I think they don't care anyway. Like which user reads the
debian/copyright really? If at all, you are interested in the
copyright of a certain library you wish to use, but this doesn't
require the extensive file-by-file information of DEP5. Most likely
the documentation provides much clearer information.
Regarding c: SPDX is as good as DEP5 if not even better due to file hashes.

> If we permit a debian/copyright format that is not human-readable, it
> means that I cannot confidently proof-read and change the contents of
> the debian subdir without the help of machine-parsers, and I would need
> to know two formats with different goals.>

I don't see the problem with machine parsers. We already use a lot of
different tools for our processes (git, dput, dpkg, debhelper,
lintian, uscan, a mail program, a text editor, ...), adding one more
shouldn't be a big deal. It needs to be provided of course, but I plan
to do that.

> I would like to instead welcome the REUSE developers in helping Debian
> evolve next version of the existing machine-readable format to better
> align with SPDX.

While this would be nice, I think this is just unrealistic. While I
may implement DEP5 output to REUSE, I still want to use SPDX because
it is already an existing industry standard having all the "features"
we want. Adding things like file hashes and referencing / merging
other DEP5 documents is certainly possible, it would make the format
less readable and in the end just SPDX looking differently.


On Tue, Feb 8, 2022 at 5:00 PM Scott Kitterman <debian@kitterman.com> wrote:
>
> Since Debian policy requires verbatim copies of licenses (or links to /usr/
> share/common-licenses), I think any policy compliant debian/copyright will
> have to be human readable, but I'm not that familiar with SPDX, so maybe it
> will surprise me.

You can find an example in my initial mail [1].

> I would be good to understand how this proposal supports Debian Policy.

It would require a minor change: putting the verbatim license texts in
a single file is not possible anymore. But I don't why just copying
the licenses to "/usr/share/doc/PACKAGE/licenses/LICENSE" in addition
to the SPDX formatted debian/copyright would be any worse than the
current way.


Regards,
Stephan

[1] https://lists.debian.org/debian-devel/2022/01/msg00309.html


Reply to: