[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99933: second attempt at more comprehensive unicode policy



On Wed, Jan 15, 2003 at 04:41:57PM +0900, Junichi Uekawa wrote:

> > > Not all of the statements made in that thread are not quite true,
> > > and I seem to remember seeing some hacks done by Ukai-san on that
> > > respect, for UTF-8.
> > 
> > Hmmm...could you elaborate?
> 
> I think our man-db and groff have been hacked in two ways:
> 
> 1) to special-case japanese locale (ja_JP.eucJP) and 
> act specially in that case only (using -Tnippon device)
> 
> 2) to work with utf-8

2) is present in groff upstream, actually, but 1) interferes with it in
some exciting ways. We can probably manage to patch it up so that UTF-8
doesn't break quite so badly, but really it's almost impossible to get
completely correct output in all encodings from current groff, which has
historically had a hard-coded expectation of ISO-8859-1 input that
reaches quite deeply into its design. There is no (standard) way for a
document to state its encoding. groff 2.0 is planned to fix this by,
among other things, changing its input encoding expectation to be UTF-8
instead, but that's some way off yet.

man has a big table of language directories and what groff output
devices are conventional in each. It's clearly not exactly ideal, but
it's the best we've got for now.

I think it is undeniably true that the man-db/groff toolchain is not yet
ready for Debian policy to mandate UTF-8.

> I seem to remember 1 was the case in potato, or woody, breaking 
> use under ja_JP.utf-8.

ja_JP.UTF-8 may be hackable in man nowadays; please send patches if you
can get it to work. :)

> I think Colin Watson should know better about the status...

I can supply pointers, but Fumitoshi UKAI is the real expert on groff
encodings.

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: