[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: debconf template translation

On Mon, Mar 03, 2003 at 08:29:12PM +0100, Michael Bramer wrote:
> I don't like manpage-LANG, ... packages.

Yeah, me neither. Their principal benefit right now is that as long as
relatively few people have localized man pages we have a hope of sorting
the encoding issues out. I'm working on code for man-db at the moment
which throws the current broken encoding calculations out of the window
and does it all from scratch; we may have to make a few alterations once
all this has settled.

[For those who care:

The problem is that, traditionally, groff's input encoding is defined as
ISO-8859-1. No ifs, not buts. Yes, even for UTF-8 output. For anything
outside that range you're supposed to use named characters from

Now, we have lots of Debian patches to groff which add multibyte
language support - excellent, as unpatched groff doesn't really have a
chance of managing this right now. However, the patches change encoding
expectations, sometimes in ways that aren't very well defined: the
ascii8 device, for example, breaks all kinds of assumptions, and the
fact that all sorts of stuff relies on it renders it impossible to do
things like format Polish man pages to DVI. I hope we can eventually
throw it away.

Short-term and long-term upstream changes to groff will improve the
situation, with things like official ISO-8859-2 support for some devices
coming soon, and UTF-8 input on the distant horizon. man needs to be
changed so that it can use all of this properly, and various bugs in
groff need to be fixed too.

I plan to try to hammer out the worst encoding problems over the next
few months. As this progresses it may become necessary to contact
maintainers of some localized man page collections and ask for bulk
changes. I hope not, but it's possible. In the meantime I very strongly
recommend that man page authors follow the advice in groff_char(7) and
use only named characters certainly for anything outside ISO-8859-1, and
if possible for anything outside 7-bit US-ASCII, with the exception of
multibyte (CJK) languages. If there are problems with this, contact me.]

Colin Watson                                  [cjwatson@flatline.org.uk]

Reply to: