Re: Bug#440420: [PROPOSAL] Manual page encoding
- To: Colin Watson <cjwatson@debian.org>
- Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org
- Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
- From: Russ Allbery <rra@debian.org>
- Date: Tue, 01 Jan 2008 11:37:30 -0800
- Message-id: <[🔎] 87tzlxs5fp.fsf@windlord.stanford.edu>
- In-reply-to: <20071231143748.GZ13328@riva.ucam.org> (Colin Watson's message of "Mon\, 31 Dec 2007 14\:37\:48 +0000")
- References: <20070901120232.GB18492@riva.ucam.org> <46DC2A62.50402@debian.org> <20070903164719.GE6091@riva.ucam.org> <46DD1D67.6020906@debian.org> <20070904105256.GK6091@riva.ucam.org> <871w93cr9f.fsf@windlord.stanford.edu> <20071231143748.GZ13328@riva.ucam.org>
Colin Watson <cjwatson@debian.org> writes:
> Right. Here's an update; I think I've captured most of the discussion in
> the thread so far. The following patch could in principle be applied
> now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> rather dense recommendation; I'm also looking for seconds for this
> proposal.
This proposal and patch looks good to me, although I'd prefer to see a few
more seconds before I queue it up for applying to Policy 3.7.4.
> I'm still open to whether new-world-order pages should go in
> /usr/share/man/LL.UTF-8 or just /usr/share/man/LL. Pros for LL.UTF-8:
>
> * Non-compliant implementations (I'm guessing xman, yelp, etc.) will
> display English manual pages rather than misencoded garbage. This
> might not be such a big deal for European languages, but for e.g.
> Japanese I suspect most people would prefer English to the spew you
> get by trying to interpret UTF-8 as EUC-JP.
I'd rather fix the other implementations, frankly. All of Debian is
moving towards UTF-8, as is all of the rest of the Linux world, and I'd
rather not leave transitional measures around forever.
> * Determining progress towards universal UTF-8 encoding can trivially
> be done by scanning Contents files rather than having to unpack the
> archive and run iconv over everything.
Yeah, but we already have an unpacked version of the archive available in
the lintian lab, so doing this isn't too bad.
> * In the event that we later want to migrate to yet another
> "universal" encoding that can't be automatically distinguished from
> UTF-8, we already have the encoding name right there and migration
> will be straightforward. (I think this is an unlikely scenario.)
Yes, this seems extremely unlikely to me. UTF-8 isn't perfect, but it
seems to have reached the "good enough" level that people will work around
its flaws rather than replace it with something else.
> I think I am increasingly leaning towards just using /usr/share/man/LL,
> seeing as man has to try decoding pages there as UTF-8 first anyway, but
> please comment if you care.
I agree with this position.
> Unfortunately 2.5.0 wasn't quite enough. Aside from a couple of stupid
> bugs (mostly fixed now), it turns out that we need an extra feature to
> allow debhelper to produce UTF-8 versions of manual pages without
> needing the source encoding to be explicitly specified, by guessing the
> encoding in the same way that man does:
>
> http://lists.debian.org/debian-i18n/2007/10/msg00063.html
>
> I committed this feature to my development trunk earlier today, and will
> be working on a 2.5.1 release over the next couple of weeks. After that
> I'll send Joey a patch for debhelper.
It sounds like the same feature could be used by other man implementations
that currently can't deal with UTF-8.
The transition plan looks good to me.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: