[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Man pages and UTF-8


I proposed Colin to work on it during Debconf, but still had no time to do

Interested peoples should read #196762

I tested a CVS snapshot of groff, and now it supports UTF-8 inputs (thanks
to the preconv preprocessor) without patches. There is at least one
remaining issue, which is that it does not recognize family of glyphs.
Thus all glyphs are considered of the same size (we won't provide a font
description file with the list of every UTF characters), and thus the
output of groff is ugly.
(Except for this issue, I could display nicely French, English, Japanese
and Vietnamese UTF-8 manpages)

I will port the part of ENABLE_MULTIBYTE which permits to specify ranges
of characters, and see if it looks OK.

All this is based on still unreleased software. I don't think using 0.19
would help supporting UTF-8 manpages in Debian (and porting
ENABLE_MULTIBYTE would be necessary). The CVS version looks much more
(But if you want to try 0.19, just have a look at the above bug number,
there is a patch mentioned in the bug log)

The CVS version introduced a -K option to specify the encoding
of the input file to groff. This should help to plan a transition for UTF-8
manpages by using this option in man-db. Slowly moving files from man/ to
man.UTF-8/ while still supporting the legacy encoding in man/ would
be a simple transition plan.

BTW, if you want to provide a kanji-form, you can try something like:
.warn 0
Kimutaku <test@example.com>

The .warn 0 should stop any warning. The kanji-form will not be supported
and won't be displayed until it's supported. This should not bother the
other lines providing the romanji-form of the author.

Note: the only real issue with lack of UTF-8 support for manpages in
Debian is that it is not possible to provide manpages translated in
languages whose only valid encoding is UTF (e.g. Vietnamese).
Otherwise our man-db/groff combination works really nicely and permits to
display manpages with very little annoyances (i.e. I don't consider having
to drop my cedilla in manpages to be a real issue).
UTF-8 is supported on output, so it is really transparent for users.

Best Regards,

Reply to: