[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New field in binary stanza

On Mon, Dec 24, 2007 at 06:52:12PM +0100, David Paleino wrote:

> Anyway, I'm seeing that what I'm telling now has already been proposed for
> debian/copyright. The problem is still there though: the chance to see some
> information about the license of not installed packages not being
> connected to the Internet.

This is solvable by packaging a file with the data extracted from the
archive: the information will then end up in the CD.  I do that for

> Well, most of Debian packages have simple licenses (see: GPL, BSD, MIT). And,
> again, the field would be totally optional.

In the other mail I sent to this thread I was showing the steps that
could be followed to implement this with apt-xapian-index.  The tricky
parts are the first and second step:

 1. Define what kind of searches you want to allow people to do
 2. Define what kind of information you need to index for those searches

I mentioned that these steps might not be possible to be attacked in a
useful way.  To understand why I say this, consider:

 - the variety of licenses we have in the archive
 - that different bits of a package can have different licenses
 - that the copyright file applies to the source package but the search
   probably happens on binary packages.

I had a look at http://wiki.debian.org/Proposals/CopyrightFormat, and I
strongly endorse that proposal.  The 'License:' field proposed there
looks like it's the best data source for this.  However, if more than
something like 20% of the packages in the archive end up having
'License: other', in my experience that field risks to end up being
useless for searches.

Consider also this scenario:

  Source package foo contains a debian/copyright file that says "the
  library is LGPL, the executable tools are GPL, the examples are WTFPL,
  the debian packaging is BSD-3"[1].
  How should we handle it?  I can think of two cases:

   1. libfoo-dev only shows LGPL, libfoo-bin only shows GPL,
      libfoo-examples only shows WTFPL.  In this case, how do you sort
      the various licenses into the binary packages?  And also, where
      did BSD-3 go?
   2. All the binary packages list all the licenses.  In this case,
      when you search for WTFPL (or BSD-3) you end up with libfoo-dev,
      libfoo-bin and loads of other false positives among the results.

I know it's easy to think "'License: GPL' is all I need", and I also
know it's easy to think "it's too much of a mess, it can't be done".
What is hard to think is "let's see what really can be done".

To really attack this problem, we need to have some statistics about
what really is the distribution of licenses around the archive, so we
really know what we're talking about.  I suppose that starting to adopt
http://wiki.debian.org/Proposals/CopyrightFormat could be a good way to
make it possible to collect such statistics.

Another rather important thing that can be done at this stage is to
provide use cases for using the data, check if
http://wiki.debian.org/Proposals/CopyrightFormat provides enough
information to support those use cases, and in case something is missing
see if it can reasonably be added and how.



[1] When CC-BY-SA 3.0 will be out, you can reasonably add "Documentation
    is CC-BY-SA-3", and a libfoo-doc package to the list.
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

Attachment: signature.asc
Description: Digital signature

Reply to: