What licenses should be included in /usr/share/common-licenses?

Hello everyone,

I come seeking your opinions.  Please cc 885698@bugs.debian.org on replies
so that we can accumulate this discussion in a Debian Policy bug.

One of the responsibilities of the Policy Editors is to determine which
licenses should be included in /usr/share/common-licenses, and thus do not
have to be reproduced in the copyright file of every package that use
them.  We have never had a clear criteria for this.  We need one, so that
we can advertise a clear and transparent policy for inclusion without
having the conversation from first principles for each new license.

I was the one who made the last few decisions, and I based the decision
largely on the number of binary packages in Debian using the license.
When I was doing this, I set a fairly high threshold (more packages than
the least popular package currently in /usr/share/common-licenses, which
historically has been GFDL-1.3 although it now appears to be MPL-1.1).  No
one was entirely satisfied with that criteria, including me.

I have the following questions:

1. What criteria (besides the obvious one of being a DFSG-free license)
   should we apply when deciding what licenses to include?  Number of
   packages?  Length?  How positive we feel towards the license?  Some
   combination of these things?  Please be specific.

2. If we use number of packages as a criteria, what should the threshold
   be?  I have appended to the bottom of this message the current output
   of my ad-hoc license-count tool run against the current archive so that
   you have a feeling for how many packages use various licenses.

3. If we use number of packages, should that be source packages or binary
   packages?  Source packages represent maintainer effort; binary packages
   represent disk clutter.

4. Should there be a length cutoff for licenses, such that we do not
   include in /usr/share/common-licenses any license shorter than some
   number of lines or bytes?  The justification would be that telling
   people to go look elsewhere for the license has some inherent overhead
   and annoyance when they discover that the license is all of ten lines
   and could have just been included in the copyright file.

5. Should we exclude licenses that contain text that all or most users of
   the license customize when they use it?  For example, the existing
   /usr/share/common-licenses/BSD contains the clause:

      3. Neither the name of the University nor the names of its
         contributors may be used to endorse or promote products derived
         from this software without specific prior written permission.

   which users of this specific license usually change to instead include
   the name of their organization, or their name, or something else.  Full
   disclosure: it will be very hard to convince me that licenses used this
   way should be included in common-licenses, since I believe it is
   technically incorrect to omit a license and point to the
   common-licenses version when the provisions of the common-licenses
   version are different in detail due to naming different people or
   requiring or prohibiting mentioning of different names as endorsements.

Here are various concerns that people have had in this area in the past.
I'm neither indicating agreement nor disagreement with any of these
points, only listing them to provoke thought about some of the things
people have raised before.

* Including long legal texts in debian/copyright, particularly if one
  wants to format them for copyright-format, is tedious and annoying and
  doesn't benefit our users in any significant way, and therefore we
  should include as many licenses as possible in common-licenses to spare
  people that work.

* common-licenses consumes disk space on every installed Debian system of
  any size, and therefore should be kept small to avoid wasting system

* Every appproved DFSG license should be included in common-licenses so
  that it serves as a repository of licenses the project has approved.

* Including a license in common-licenses implies that the project approves
  of that license, and therefore licenses such as the LaTeX Project Public
  License 1.0, which requires renaming derived works, should not be
  included even though DFSG #4 grudgingly allows for this type of license

* All licenses explicitly mentioned in the Debian Free Software Guidelines
  should be present in common-licenses (as justification for including the
  BSD license even though the current text is specific to the Regents of
  the University of California).

In order to structure the discussion and prod people into thinking about
the implications, I will make the following straw man proposal.  This is
what I would do if the decision was entirely up to me:

    Licenses will be included in common-licenses if they meet all of the
    following criteria:

    * The license is DFSG-free.
    * Exactly the same license wording is used by all works covered by it.
    * The license applies to at least 100 source packages in Debian.
    * The license text is longer than 25 lines.

I will attempt to guide and summarize discussion on this topic.  No
decision will be made immediately; I will summarize what I've heard first
and be transparent about what direction I think the discussion is
converging towards (if any).

Finally, as promised, here is the count of source packages in unstable
that use the set of licenses that I taught my script to look for.  This is
likely not accurate; the script uses a bunch of heuristics and guesswork.

AGPL 3                  277
Apache 2.0             5274
Artistic               4187
Artistic 2.0            337
BSD (common-licenses)    42
CC-BY 1.0                 3
CC-BY 2.0                15
CC-BY 2.5                13
CC-BY 3.0               240
CC-BY 4.0               159
CC-BY-SA 1.0              8
CC-BY-SA 2.0             48
CC-BY-SA 2.5             16
CC-BY-SA 3.0            425
CC-BY-SA 4.0            237
CC0-1.0                1069
CDDL                     67
CeCILL                   30
CeCILL-B                 13
CeCILL-C                  9
GFDL (any)              569
GFDL (symlink)           55
GFDL 1.2                289
GFDL 1.3                231
GPL (any)             20006
GPL (symlink)          1331
GPL 1                  4033
GPL 2                 10466
GPL 3                  6783
LGPL (any)             5019
LGPL (symlink)          265
LGPL 2                 3850
LGPL 2.1               2926
LGPL 3                 1526
LaTeX PPL                46
LaTeX PPL (any)          40
LaTeX PPL 1.3c           32
MPL 1.1                 165
MPL 2.0                 361
SIL OFL 1.0              11
SIL OFL 1.1             258

Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>

