[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: debian/copyright verbosity

On Tue, 14 Apr 2009 19:27:33 +1000
Ben Finney <ben+debian@benfinney.id.au> wrote:

> This seems a useful summary:
> Neil Williams <codehelp@debian.org> writes:
> > Does Files: *.c mean that everything below applies equally to all
> > files that match the pattern or does it mean that the statement
> > includes a summary of all files that match the pattern?
> Before this thread, I was under the unquestioning assumption that the
> former interpretation was the only one. The question has never, to my
> knowledge, been raised explicitly like this before. I'd like to know
> what the consensus of the Debian project is on the question.

>From the basis that most packages (particularly large packages)
currently in Debian main do not explicitly separate each Copyright
statement on the basis of individual source files and do not include a
verbatim copy of every single Copyright statement throughout the entire
source code within debian/copyright, I can only think that your
interpretation was too strict.

Collation of copyright statements is, AFAICT, the established pattern
within Debian - especially in large packages.

> > AFAICT it is perfectly acceptable for debian/copyright to collapse
> > those to:
> > 
> > >  Files: *.c
> > >  Copyright: 2006, 2008 Mr. X
> > >  Copyright: 2005 Mr. Y
> > >  License: GPL2+
> > 
> > There is no collapsing of the years - each year is described
> > separately.
> > 
> > The copyright is retained and each file is listed in debian/copyright
> > under the correct licence.
> That's pedantically true, perhaps, but only in the sense that I can say
> the entire works of Shakespeare are retained in the keys on a keyboard.
> The relevant question, it seems to me, is whether the information is
> preserved usefully.

Of course it is, anyone needing more information has to go to the
source code. If you need more information than the collated data, you
will be needing to read the source code files anyway.

> > You seem to be proposing an absurd exaggeration of wildcard semantics.
> I'm discussing what seems to me the only logical interpretation of the
> existing semantics. Whether those semantics need to be changed is a
> separate matter; it's new to me that there is even another way to
> interpret these semantics.

In that case, I'd suggest filing a bug against the Developer Reference
or possibly New Maintainer Guide stating that collation is the usual

It is implicit in the current New Maintainer Guide:


Upstream Author(s): <put author(s) name and email here>



Your debian/copyright file must contain the following information:

 - The author(s) name
 - The year(s) of the copyright
 - The used license(s)
 - The URL to the upstream source

Plural - with no requirement to separate the authors. 

In the example later on, the second licence causes the copyright
details to be listed separately.

I see no consideration there that the copyright holders should be
listed on a per-file basis - indeed, the example implicitly shows a
separation on a per-licence basis only.

> > Does Files: *.c mean that everything below applies equally to all
> > files that match the pattern or does it mean that the statement
> > includes a summary of all files that match the pattern?
> Before this thread, I was under the unquestioning assumption that the
> former interpretation was the only one. The question, to my knowledge,
> has never been raised explicitly like this before. I'd like to know what
> the consensus of Debian is on the question.

Consensus can also be gleaned from the common practice of packages
already in main. It is extremely common to find debian/copyright
contains a single list of copyright holder details and a single licence
statement, no matter how those copyright details are actually
attributed throughout the source code. It still is the default template
from dh_make.

> > It does not confer anything in the reverse case, it does not mean that
> > the matching patterns have any relationship to each other other than
> > the pattern. Matching * simply means that the file is one of the set of
> > files that match the pattern - it does not follow that every statement
> > about those files applies equally to all matches, other than that they
> > all match the pattern.
> Then what can it usefully mean to apply a single group of statements to
> an undistinguished glob of files? 

It means:

"The files matching the pattern comprise a set of files which when
taken as a group have the following copyright holders and licence

Notice: plural holders, singular statement.

That is a perfectly usable format - the only reason you'd want more
information is if you are copying a section of one source code file
into another project with a compatible licence but then, as you are
reading the source code file before copying, you can choose to copy the
smaller copyright statement from that particular file - or you can
choose to copy the entire collated statement. Technically, this adds
someone to the copyright holders who was not explicitly listed as a
copyright holder for that file but who is going to care about that?

There is no legal issue with collating copyright across a group of
files. Have you any idea how many copyright holders have copyright over
the various components in something the size of Debian? Of course
collation has to be acceptable - individual attribution is completely
impossible to achieve, nobody can afford to do it.

> If you're not intending the whole
> statement to apply to the whole glob, I don't see what the reader can
> usefully interpret it to mean in the absence of extra information —
> which seems counter to your purpose, as I understand it.

It means that the files, as a group, are copyright to those people.
That's all. That's all that it needs to mean.
> > Not at all, debian/copyright is not about the claim, it is about the
> > summary of what claims exist. The claims themselves cannot be divorced
> > from the source code
> What do you mean, they cannot be divorced? 

The claim of copyright is entirely in the remit of the source code file
- the licence usually prevents us from changing that. In that sense,
the copyright cannot be divorced from the source code because we cannot
*remove* someone from the list of copyright holders within the source
code files themselves. At no point is anyone asked to modify the
copyright statements of source code files in Debian packages -
mentors@l.d.o has consistently stated as much. The copyright
statements of source code files are inviolate - they can only be
changed by upstream.

However, debian/copyright is entirely our own - we can choose to do
what we want with that file.

As an upstream contributor, I can only say that the level of
introspection that you seem to assume happens in the assignment of
copyright to individual files is completely absent in all of the
upstream projects to which I have contributed.

The upstream model, as I've experienced it, is one of two methods:

1. If you make "significant changes" to a source code file, feel free
to add yourself to the Copyright of that file - there is absolutely no
rigour to the definition of "significant". I've seen significant
changes that change a / to a % - *once* in a single file out of
hundreds in the source package. It was a long standing bug that
previous contributors simply hadn't spotted - fixing it merited an
addition to the Copyright of that file. I've seen changes to source
code files that touch 50% of the lines in the file without any
Copyright attribution being requested or assigned.

2. You don't add anything to Copyright in any file until one of the
core maintainers for the package does it for you.

Neither is particularly rigorous, neither is absolute. Both methods are
inherently asynchronous and inconsistent between projects.

> > You're reading something different into the * wildcard.
> > 
> > *.c does not mean that everything applies equally to every file, it
> > means that for the files that match the pattern, the following
> > copyright statements may apply - i.e. a summary. I don't see anything
> > wrong with that.
> Before we get to whether it's wrong in the sense of “acceptable”,
> whence does this interpretation come? It's certainly new to me, and
> incompatible with what I understood the meaning of the information in
> ‘debian/copyright’.

It comes from dh_make, it comes from the vast majority of
debian/copyright files already in main and it comes from the simple
logic that your interpretation cannot be implemented in any large
> > > You seem to be advocating that ‘debian/copyright’ is not, itself,
> > > making any specific copyright claims beyond “some combination of
> > > these holders, these years, and these files, have some set of
> > > copyright associations;
> > 
> > Absolutely.
> Why, then, put such a vague statement into the file at all? What is the
> useful threshold of summarisation? These are questions it has never
> occurred to me might need to be asked, but your interpretation makes
> them necessary.

The threshold is fuzzy - it's constructed from an abstract sense of
what the maintainer feels is suitable and what the upstream decide to
put into particular files, commonly AUTHORS. Upstream do not include
the copyright details of everyone who has ever submitted a patch so
those details are completely impossible to add. The main purpose,
AFAICT, is that the main contributors are correctly attributed, nothing

> Some means of achieving consensus, then. (Which may, of course, be
> possible simply by discussing it in ‘debian-devel’.)

As I've said, consensus can also be reached by simply scanning the
existing debian/copyright files on your own system.

Policy only documents existing practice - if you want to know what
Debian feels is the consensus on a packaging issue that is not
described in Policy, studying existing practice is a valid way of
discovering how to proceed with your own practice.

We do not need every single stage to be laboriously mangled into
legalese for Policy.



Neil Williams

Attachment: pgpDkbbsLERCh.pgp
Description: PGP signature

Reply to: