[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: making debian/copyright machine-interpretable


Sam Hocevar <sam <at> zoy.org> writes:

> On Sat, Aug 04, 2007, Joey Hess wrote:

[initial comment moved here to fool gmane's "you seem to be top-posting" message]
I've come across this intersting thread only today (I was without network access
in early August).  Although I haven't read everything, it seems like one point
has not been raised. If you answer, please Cc me since I'm not subscribed to any
Debian list except debian-tex-maint at the moment.
> > * I don't see much benefit in putting freeform text at the top of the
> >   file. Keeping it all at the bottom would simplify parsing/validating.
>    I tend to agree with you. The first version used to be like this,
> and I got many suggestions for adding freeform text at the beginning
> in order to keep the file human-readable so I switched to allowing it
> everywhere. I'll wait for more comments on why it may be useful, but the
> paragraph below about tarball origin seems already a valid use case.

Note that machine-readable copyright files might get very long (tetex-base in
etch has more than 1400 lines, most of which are in our poor-man's
machine-readable format, but that covers only a subset of the files in the
package).  Therefore I think we should either allow freeform text at the top of
the file, or at least a field that indicates "Hey, it might be worth scrolling
down to the end".

Two more points:

- Packages are buggy, even copyright files are.  We should add a
  machine-readable field that indicates that the rest of the machine-readable
  information might not be correct or complete.  Reasons might be "Upstream
  acknowledges that the license information in subdir foo/ is partly
  incorrect/nonfree and is working on relicensing or replacing the code,
  lenny-ignore tag granted" or "This package is a compilation of software from
  various sources. The integration work done by upstream is Copyright 2006 Alice
  Aiken, 2005&2006 Bob Brown, but the upstream license file does not indicate
  which files were altered or added by them. The list of these files was
  prepared by the Debian maintainer and might be incomplete".  

  These reasons can be given in free form, but there should be a
  machine-readable field to indicate that such information exists.

- looking at tetex-base's copyright file, I notice that we keep more
  per-filepattern information in it.  This is probably not needed for most
  packages, but it should be possible to easily add it in case it is useful.
  It could e.g. go into a notes field, whose content is not necessarily
  machine-readable, but clearly assigned to a particular filepattern stanza.
  tetex-base uses

3.4 List of licenses of individual parts

% abstract: lppl (verification data:1.1:1.1:2006-02-08:norbusan:)

% acronym: lppl (verification data:1.26:1.17:2006-03-07:frank:header)


  which indicates that one part of tetex-base is the abstract CTAN package,
  which is under the lppl[1], and since this information is not in the teTeX
  license file, we also note that: The current version of abstract is 1.1, the
  license info was verified by norbusan (Norbert Preining) for version 1.1 on
  2006-02-08, and he didn't note in which file in the abstract package the
  license information can be found.  For acronym, I did the verification, but
  there's a newer upstream version (1.26) now, and I noted that the info is not
  in a particular file, but in the headers of the individual source files.

  I suggest that there should be a Notes: field.  We would use it to indicate
  the name of the part as used in the TeX Catalogue (were we keep the license
  information) and the info in parentheses.

TIA, Frank

[1] by the way, should we discriminate between LPPL-1.3a, LPPL-1.3b, and things
like LPPL-1+ and LPPL-1.2+ (versions 1.2 or older were not DFSG-compatible, but
of course any software with an "or later" clause are acceptable)?

Reply to: