[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unicode conversion goals for Sarge (was Re: debian-changelog-file-uses-obsolete-national-charset)



[Please honour my Mail-Followup-To: header.]

On Wed, Mar 31, 2004 at 04:41:44PM +0300, Martin-Éric Racine wrote:
> On Wed, 31 Mar 2004, Colin Watson wrote:
> > On Wed, Mar 31, 2004 at 09:05:04AM +0300, Martin-Éric Racine wrote:
> > > Heck, if you ask me, Sarge should be known as the "we upgrade everyone
> > > to UTF-8" Debian release. This would imply that absolutely every
> > > package to be released in Sarge would know about legacy encodings for
> > > each locale and be able to recode every config file, man page,
> > 
> > Not possible. UTF-8 man pages are not yet supported by groff, and won't
> > be until groff 2.0.
> 
> Do you envision this as being possible for Sarge+1 then?

That entirely depends on when it gets implemented in groff. It's
something upstream is working on and something on which there's been
some incremental progress in recent versions of groff, but it's
fundamentally hard and will probably involve incompatible changes.

> At this point, it seems that all Debian-specific tools either default to
> UTF-8 or can handle UTF-8, so it doesn't seem like such a difficult goal.

I don't think you'd say that if you knew more about groff internals. The
assumption of ISO-8859-* runs deep (chiefly ISO-8859-1 - it's only
recently that decent support for ISO-8859-2 and ISO-8859-9 was added),
and it needs quite a few internal changes to remove that assumption. It
is not at all a simple matter of adding calls to iconv, since groff
needs to know more about the text than that. The extensive Debian patch
to groff manages to support Japanese and possibly other CJK languages,
but that patch is so extensive that I've been unable to update it to
groff 1.19.

man (mostly) supports you running in a UTF-8 locale by means of some
complicated iconv kludges. It will *not* generally support UTF-8 in
source man pages until groff upstream supports it. While it would be
possible to shove another iconv in at the start (and in fact this is
done for ja_JP.UTF-8 for evil reasons), I don't want to put myself in
the position of having to convert the world twice in the event that
groff upstream do the transition slightly differently from the way I'd
do it.

I think UTF-8 is the future, use it fairly extensively myself, and will
continue to work towards having it everywhere, but let's not make the
mistake Red Hat made of pushing UTF-8 beyond the current capabilities of
our software.

Cheers,

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: