Re: Bug#374569: groff-base: groff-base includes non-free material
On Sun, Jun 25, 2006 at 11:08:34AM +0100, Colin Watson wrote:
> On Wed, Jun 21, 2006 at 04:54:01PM -0500, Manoj Srivastava wrote:
> > On 20 Jun 2006, Colin Watson verbalised:
> > > It's dual-licensed upstream; I contacted upstream years ago about
> > > this issue (before it became particularly public that Debian had a
> > > problem with the licence) and arranged for the following statement
> > > to be added to the top-level LICENSES file:
> > >
> > > All files part of groff are licensed under this version of the GPL
> > > (or licenses which are compatible with the GPL). You are free to
> > > choose version 2 or any subsequent version of the GPL.
> > >
> > > Unfortunately, for technical reasons (see bug #196762), it is
> > > extremely difficult to upgrade to the new upstream release. If you
> > > like, I can simply include a note in the copyright file with similar
> > > contents to this e-mail, although I don't know if that's good form.
> > Unfortunately, I think that means we have to take the stance
> > that the old version is non-free, but the future version is freed;
> > unless we can get upstream to release the version in Debian with the
> > new license.
> I guess in that case I will have to resume efforts to get 1.19 sorted
> out more urgently. I don't want to embark upon the busy-work of
> splitting documentation out into a separate package only to put it back
> in again later ...
So, I've done some more investigation, and I'd like advice from
For those not familiar with groff, it has historically accepted only
ISO-8859-1 input; internally, it has always been very much hardwired for
single-byte input. The Debian groff package has for a long time
contained a highly complex "multibyte" patch to support EUC-JP (and
later other CJK) input; it works, but is not terribly clean, and for one
thing causes groff's behaviour to depend on the locale rather than
solely the file it's processing and its command-line arguments, which is
wrong for a text processor like groff. It also contains support for an
"ascii8" device which essentially just passes through the encoding of
the source text; this is typographically unsound because, for instance,
you can't do decent hyphenation that way, but we're relying on this for
Czech, Croatian, Hungarian, Polish, Russian, Slovak, and Turkish man
pages at the moment.
Upstream has long stated an intent never to accept this patch, and
instead wants to work on UTF-8 support, with a preprocessor to convert
from other encodings as necessary.
This has been the state of play for several years now. I've tried to
port the Debian multibyte patch forward to groff 1.19 and later releases
on more than one occasion, but it's a very complex and intrusive patch
and I've hit roadblocks that are extremely hard to surmount. groff 1.19
made a number of internal improvements for the better (notably Unicode
composite glyphs), but the changes conflict in a big way with the
multibyte patch. The authors of that patch haven't seemed able to help,
and upstream is entirely uninterested. I've pretty much been stuck
maintaining a package based on 18.104.22.168, with no way to jump forward
without breaking the now significant number of users relying on CJK
support. Some people have suggested reviving the old jgroff package for
this and making man use it where appropriate; I'm very much loath to do
this, because it's a non-trivial amount of unrewarding packaging work,
and it results in either bloating the base system with two versions of
groff or requiring all CJK users to know or be told to install jgroff.
On another note, while the GFDL discussion was still bubbling on
debian-private and before it came up publicly as an issue, I noted that
most of groff's documentation was under the GFDL, and was very concerned
about the usability of groff in the event that its documentation had to
be removed; I'd have serious trouble writing any non-trivial groff
documents without the groff documentation! I contacted groff upstream to
ask whether its documentation could be dual-licensed under the GPL.
After some discussion, they agreed, resulting in a note in groff's
LICENSES file that "All files part of groff are licensed under this
version of the GPL". Unfortunately, this note was added after groff
22.214.171.124 was released, and Manoj points out that it's not entirely
obvious that we can take advantage of it. This prompted me to have
another look at the current state of groff upstream with respect to
Bruno Haible has been working on Unicode support in groff, and CVS groff
is now very close to being able to render CJK text on a par with what
the Debian patch offers, by means of a preprocessor ("preconv") that
converts all non-ASCII text into groff escapes according to an encoding
specified on the command line. There are a number of other internal
improvements in Unicode support too, although input is still
fundamentally single-byte; however the escaping preprocessor makes this
less important than it used to be. The major missing features in
Japanese rendering are handling of double-width characters and support
for kinsoku shori (Japanese line-breaking rules). Werner Lemberg, the
upstream maintainer, is very clear that these should be implemented by
means of adding glyph class infrastructure to groff, so that properties
of ranges of glyphs can be set in groff's font files without using lots
of memory to say that each of several thousand glyphs is double-width.
This is a moderate chunk of work, but it's at least reasonably
accessible from where we are now.
The situation for non-ISO-8859-1 single-byte encodings is essentially
solved. The ascii8 device is superseded by preconv. Implementing
hyphenation for Russian wouldn't do any harm, but it's all a matter of
macro files from here on in.
I'm on holiday away from computers all next week, so I can't
realistically do anything before the base freeze. I therefore have two
proposals, either of which really ought to be signed off by the release
One is to do nothing for now, and make an exception for the groff
licensing bug on the bases that (a) groff is nearly unusable for
authoring without its documentation and (b) upstream considers the
current versions of those files to be GPL-licensed. I wouldn't make that
request based on (a) alone - we've had the "but it's too useful to be
non-free!" whine many times before, and I don't think it's valid - but
given (b) it seems to be worth considering. I don't think that splitting
off groff's documentation is a good idea, because aside from the small
man-page-formatting-only part of groff that's in groff-base, the rest of
the groff package is really too painful to use without its
documentation: much harder than e.g. make without make-doc. I'm willing
to go to almost any lengths to avoid that option.
The other option is to try to accelerate the implementation of glyph
classes, width handling, and kinsoku shori handling in groff as much as
possible, so that we can update to CVS groff (perhaps with some
additional patches) and not regress CJK support too badly. This would
also require changes in man-db. Obviously, this would require getting
testing from several CJK users to confirm that the output is still
reasonably readable, and it involves an exception to the base freeze.
How does the release team feel about all of this? I'm sorry to have left
it so late.
Colin Watson [email@example.com]