Bug#99933: second attempt at more comprehensive unicode policy
On Wed, Jan 15, 2003 at 04:41:57PM +0900, Junichi Uekawa wrote:
> > > Not all of the statements made in that thread are not quite true,
> > > and I seem to remember seeing some hacks done by Ukai-san on that
> > > respect, for UTF-8.
> >
> > Hmmm...could you elaborate?
>
> I think our man-db and groff have been hacked in two ways:
>
> 1) to special-case japanese locale (ja_JP.eucJP) and
> act specially in that case only (using -Tnippon device)
>
> 2) to work with utf-8
2) is present in groff upstream, actually, but 1) interferes with it in
some exciting ways. We can probably manage to patch it up so that UTF-8
doesn't break quite so badly, but really it's almost impossible to get
completely correct output in all encodings from current groff, which has
historically had a hard-coded expectation of ISO-8859-1 input that
reaches quite deeply into its design. There is no (standard) way for a
document to state its encoding. groff 2.0 is planned to fix this by,
among other things, changing its input encoding expectation to be UTF-8
instead, but that's some way off yet.
man has a big table of language directories and what groff output
devices are conventional in each. It's clearly not exactly ideal, but
it's the best we've got for now.
I think it is undeniably true that the man-db/groff toolchain is not yet
ready for Debian policy to mandate UTF-8.
> I seem to remember 1 was the case in potato, or woody, breaking
> use under ja_JP.utf-8.
ja_JP.UTF-8 may be hackable in man nowadays; please send patches if you
can get it to work. :)
> I think Colin Watson should know better about the status...
I can supply pointers, but Fumitoshi UKAI is the real expert on groff
encodings.
--
Colin Watson [cjwatson@flatline.org.uk]
Reply to: