Bug#99933: second attempt at more comprehensive unicode policy

To: 99933@bugs.debian.org
Subject: Bug#99933: second attempt at more comprehensive unicode policy
From: Colin Watson <cjwatson@debian.org>
Date: Wed, 15 Jan 2003 11:11:13 +0000
Message-id: <[🔎] 20030115111112.GA17972@riva.ucam.org>
Reply-to: Colin Watson <cjwatson@debian.org>, 99933@bugs.debian.org
In-reply-to: <[🔎] 871y3e97wa.wl@atoron.work.isl.doshisha.ac.jp.doshisha.ac.jp>
References: <[🔎] 20030106030032.GA1754@night> <[🔎] 1041830487.15092.20.camel@space-ghost> <[🔎] 20030106210151.GA1603@tatonka.pfalz.de> <[🔎] 1041893714.19862.10.camel@space-ghost> <[🔎] 87vg11ouge.fsf@christoph.complete.org> <[🔎] 1041964317.32088.3.camel@space-ghost> <[🔎] 20030107220435.GA22074@zobe.linuxfr.org> <[🔎] 878yxn851w.wl@atoron.work.isl.doshisha.ac.jp.doshisha.ac.jp> <[🔎] 1042605905.28341.3.camel@space-ghost> <[🔎] 871y3e97wa.wl@atoron.work.isl.doshisha.ac.jp.doshisha.ac.jp>

On Wed, Jan 15, 2003 at 04:41:57PM +0900, Junichi Uekawa wrote:

> > > Not all of the statements made in that thread are not quite true,
> > > and I seem to remember seeing some hacks done by Ukai-san on that
> > > respect, for UTF-8.
> > 
> > Hmmm...could you elaborate?
> 
> I think our man-db and groff have been hacked in two ways:
> 
> 1) to special-case japanese locale (ja_JP.eucJP) and 
> act specially in that case only (using -Tnippon device)
> 
> 2) to work with utf-8

2) is present in groff upstream, actually, but 1) interferes with it in
some exciting ways. We can probably manage to patch it up so that UTF-8
doesn't break quite so badly, but really it's almost impossible to get
completely correct output in all encodings from current groff, which has
historically had a hard-coded expectation of ISO-8859-1 input that
reaches quite deeply into its design. There is no (standard) way for a
document to state its encoding. groff 2.0 is planned to fix this by,
among other things, changing its input encoding expectation to be UTF-8
instead, but that's some way off yet.

man has a big table of language directories and what groff output
devices are conventional in each. It's clearly not exactly ideal, but
it's the best we've got for now.

I think it is undeniably true that the man-db/groff toolchain is not yet
ready for Debian policy to mandate UTF-8.

> I seem to remember 1 was the case in potato, or woody, breaking 
> use under ja_JP.utf-8.

ja_JP.UTF-8 may be hackable in man nowadays; please send patches if you
can get it to work. :)

> I think Colin Watson should know better about the status...

I can supply pointers, but Fumitoshi UKAI is the real expert on groff
encodings.

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]

Reply to:

References:
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Richard Braakman <dark@xs4all.nl>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Jochen Voss <jvoss2@web.de>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: John Goerzen <jgoerzen@complete.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: barbier@linuxfr.org (Denis Barbier)
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Junichi Uekawa <dancer@netfort.gr.jp>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Junichi Uekawa <dancer@netfort.gr.jp>

Prev by Date: Re: when can a package be made architecture-dependent?
Next by Date: Bug#99933: second attempt at more comprehensive unicode policy
Previous by thread: Bug#99933: second attempt at more comprehensive unicode policy
Next by thread: Bug#99933: second attempt at more comprehensive unicode policy
Index(es):
- Date
- Thread