Re: making debian/copyright machine-interpretable
Sam Hocevar <sam <at> zoy.org> writes:
> On Sat, Aug 04, 2007, Joey Hess wrote:
[initial comment moved here to fool gmane's "you seem to be top-posting" message]
I've come across this intersting thread only today (I was without network access
in early August). Although I haven't read everything, it seems like one point
has not been raised. If you answer, please Cc me since I'm not subscribed to any
Debian list except debian-tex-maint at the moment.
> > * I don't see much benefit in putting freeform text at the top of the
> > file. Keeping it all at the bottom would simplify parsing/validating.
> I tend to agree with you. The first version used to be like this,
> and I got many suggestions for adding freeform text at the beginning
> in order to keep the file human-readable so I switched to allowing it
> everywhere. I'll wait for more comments on why it may be useful, but the
> paragraph below about tarball origin seems already a valid use case.
Note that machine-readable copyright files might get very long (tetex-base in
etch has more than 1400 lines, most of which are in our poor-man's
machine-readable format, but that covers only a subset of the files in the
package). Therefore I think we should either allow freeform text at the top of
the file, or at least a field that indicates "Hey, it might be worth scrolling
down to the end".
Two more points:
- Packages are buggy, even copyright files are. We should add a
machine-readable field that indicates that the rest of the machine-readable
information might not be correct or complete. Reasons might be "Upstream
acknowledges that the license information in subdir foo/ is partly
incorrect/nonfree and is working on relicensing or replacing the code,
lenny-ignore tag granted" or "This package is a compilation of software from
various sources. The integration work done by upstream is Copyright 2006 Alice
Aiken, 2005&2006 Bob Brown, but the upstream license file does not indicate
which files were altered or added by them. The list of these files was
prepared by the Debian maintainer and might be incomplete".
These reasons can be given in free form, but there should be a
machine-readable field to indicate that such information exists.
- looking at tetex-base's copyright file, I notice that we keep more
per-filepattern information in it. This is probably not needed for most
packages, but it should be possible to easily add it in case it is useful.
It could e.g. go into a notes field, whose content is not necessarily
machine-readable, but clearly assigned to a particular filepattern stanza.
3.4 List of licenses of individual parts
% abstract: lppl (verification data:1.1:1.1:2006-02-08:norbusan:)
% acronym: lppl (verification data:1.26:1.17:2006-03-07:frank:header)
which indicates that one part of tetex-base is the abstract CTAN package,
which is under the lppl, and since this information is not in the teTeX
license file, we also note that: The current version of abstract is 1.1, the
license info was verified by norbusan (Norbert Preining) for version 1.1 on
2006-02-08, and he didn't note in which file in the abstract package the
license information can be found. For acronym, I did the verification, but
there's a newer upstream version (1.26) now, and I noted that the info is not
in a particular file, but in the headers of the individual source files.
I suggest that there should be a Notes: field. We would use it to indicate
the name of the part as used in the TeX Catalogue (were we keep the license
information) and the info in parentheses.
 by the way, should we discriminate between LPPL-1.3a, LPPL-1.3b, and things
like LPPL-1+ and LPPL-1.2+ (versions 1.2 or older were not DFSG-compatible, but
of course any software with an "or later" clause are acceptable)?