Re: Man pages and UTF-8

On Sun, Aug 12, 2007 at 08:44:13PM -0700, Russ Allbery wrote:
> Adam Borowski <kilobyte@angband.pl> writes:
> > Issues to fix
> > =============
> > A. man output
> > B. groff processing
> > C. man input
> > Fixes for A. and B. are mostly local to "man-db", fixing C. would be a
> > Debian-wide issue.
> What I was trying to get at earlier is that I believe groff can't handle
> UTF-8 input.  So fixing B, if I'm correct, is certainly not local to
> man-db. 

Yeah, it's mostly a groff issue.  Yet, groff and man-db are very closely

> I believe that fixing groff to handle multibyte character sets
> property is a substantial amount of work.

Of course.  The thing is, Red Hat has already done this work.  They also
already converted all their man pages into UTF-8 in 2004.

> groff can apparently produce UTF-8 *output*, but I think the encodings of
> all of its input at the moment are in other character sets.

The current Debian groff can produce UTF-8 output only for a narrow range of
characters, ones which happen to be present in 8 bit charsets.  It cannot
handle UTF-8 input at all; on the other hand, Red Hat's version seem to be
working just fine.

