[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Automated copyright reviews using REUSE/SPDX as alternative to DEP-5



Since I feel this fits to the current discussion on the mailing list,
let me quickly introduce you to an idea I had for a while to improve
the copyright review situation.
TLDR: for projects using REUSE, we could generate d/copyright
automatically and approve the copyright check in NEW automatically.

- What is REUSE?
The REUSE specification [1] is a specification to make copyright
machine-readable in the source files itself. It is straightforward to
implement, add (e.g.) "SPDX-FileCopyrightText: 2019 Jane Doe
<jane@example.com>" and "SPDX-License-Identifier: GPL-3.0-or-later" as
comments to your source file's header and you are done. The license
identifiers are standardized by the SPDX [2] and are similar to what
we already use in Debian (see also [7], although a bit outdated).

The spec is made by the Free Software Foundation Europe (FSFE) and is
already implemented in several projects [3]. They also provide a tool
(available as "reuse" in Debian [4]) which can lint a source folder on
REUSE completeness and it can export the license information to an
SPDX bill of materials.

- What is an SPDX bill of materials?
It is a machine-readable format that specifies the licenses of each
file in tag/value style like DEP-5. However compared to DEP-5 it is
much less human readable, i.e. it includes much more meta information,
and does not contain the license texts. One useful aspect is that it
also includes the checksum of each file. I appended an example of how
such a document might look like below.

The spec is from the Software Package Data Exchange (SPDX), a project
hosted by the Linux Foundation. The spec is also available as ISO/IEC
5962:2021.

- What has this to do with Debian?
My idea is to allow SPDX documents in addition to DEP-5. The advantage
is that - if supported upstream - REUSE can generate such reports
automatically during package build time, so there is no need to write
d/copyright manually anymore. It is also much less error-prone, as
this can be done every time there is a new source package and does not
suffer from human mistakes like forgetting to check some files during
the copyright review.

The license identifiers can be parsed to check if the package falls
under free/contrib or non-free (except when custom licenses are used).
Packages levering REUSE could skip the manual d/copyright check in NEW
entirely, even when it is a new source package. Writing a sanity
validator would not be a hard task, there probably already exists one.

Note that since the licenses are not part of d/copyright anymore,
those have to be provided in another way. REUSE specifies that
licenses are in a top-level folder called "LICENSES", so we could
simply install that folder along the copyright file. We could also
depend on the "spdx-licenses" package [5] and symlink all non-custom
licenses to reduce duplicate files, however since a license usually
needs to be shipped with any code/binary distribution this might get a
bit complicated.

Another, IMHO less preferred, way would be to write a converter tool
from SPDX to DEP-5, but still do auto-approvals. Such a converter tool
has been proposed before [6].

- Final thoughts:
Besides the quality-of-life improvements, using this also has the
advantage of using an industry standard, i.e. shared work on tooling.
I heard that Fedora is also thinking about implementing this idea.

I've been in contact with one of the responsibles at the FSFE for a
while, and they really like this idea and are open to suggestions from
our side if we need any particular changes to the tooling. I already
have a couple of changes we need in mind, in particular with regards
to adding copyright of the debian folder without adding a header to
each file, but upstream already has some ideas for that.

Note that I don't want DEP-5 to go away - it is unlikely that every
project will follow the REUSE spec and writing an SPDX document by
hand has no significant advantages over DEP-5. Besides, using the
file-exclusion function in DEP-5 for uscan is quite useful for ds/dfsg
packages (although that could also be moved to an external file).

For now, let me just hear what you think about this idea in general.
If someone would be willing to help in this endeavor (e.g. creating
dh_reuse, writing a DEP), let me know.

Regards,
Stephan

[1] https://reuse.software/spec/
[2] https://spdx.dev/licenses/
[3] https://api.reuse.software/projects
[4] https://tracker.debian.org/pkg/reuse
[5] https://tracker.debian.org/pkg/spdx-licenses
[6] https://wiki.debian.org/SPDX
[7] https://wiki.debian.org/Proposals/CopyrightFormat#Differences_between_DEP5_and_SPDX


Example for SPDX bill of materials:
"""
SPDXVersion: SPDX-2.1
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: u2
DocumentNamespace:
http://spdx.org/spdxdocs/spdx-v2.1-0ed6ddb2-edbd-4664-8b7e-029432c8e421
Creator: Person: Anonymous ()
Creator: Organization: Anonymous ()
Creator: Tool: reuse-0.14.0
Created: 2022-01-26T10:42:59Z
CreatorComment: <text>This document was created automatically using
available reuse information consistent with REUSE.</text>
Relationship: SPDXRef-DOCUMENT describes
SPDXRef-3c8056cd1f4f60322830f1e79d55ea13

FileName: ./update_copyright_years.py
SPDXID: SPDXRef-3c8056cd1f4f60322830f1e79d55ea13
FileChecksum: SHA1: 65fc75079eb9d85953b39c6fb832e86c7b7e113a
LicenseConcluded: NOASSERTION
LicenseInfoInFile: MIT
FileCopyrightText: <text>SPDX-FileCopyrightText: 2022 Stephan Lachnit
<stephanlachnit@debian.org></text>
"""


Reply to: