Re: Man pages and UTF-8
On Sun, Aug 12, 2007 at 08:44:13PM -0700, Russ Allbery wrote:
> Adam Borowski <kilobyte@angband.pl> writes:
>
> > Issues to fix
> > =============
>
> > A. man output
> > B. groff processing
> > C. man input
>
> > Fixes for A. and B. are mostly local to "man-db", fixing C. would be a
> > Debian-wide issue.
>
> What I was trying to get at earlier is that I believe groff can't handle
> UTF-8 input. So fixing B, if I'm correct, is certainly not local to
> man-db.
Yeah, it's mostly a groff issue. Yet, groff and man-db are very closely
related.
> I believe that fixing groff to handle multibyte character sets
> property is a substantial amount of work.
Of course. The thing is, Red Hat has already done this work. They also
already converted all their man pages into UTF-8 in 2004.
> groff can apparently produce UTF-8 *output*, but I think the encodings of
> all of its input at the moment are in other character sets.
The current Debian groff can produce UTF-8 output only for a narrow range of
characters, ones which happen to be present in 8 bit charsets. It cannot
handle UTF-8 input at all; on the other hand, Red Hat's version seem to be
working just fine.
--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.
Reply to: