Bug#440420: [PROPOSAL] Manual page encoding
Colin Watson <cjwatson@debian.org> writes:
> Right. Here's an update; I think I've captured most of the discussion in
> the thread so far. The following patch could in principle be applied
> now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> rather dense recommendation; I'm also looking for seconds for this
> proposal.
This proposal and patch looks good to me, although I'd prefer to see a few
more seconds before I queue it up for applying to Policy 3.7.4.
> I'm still open to whether new-world-order pages should go in
> /usr/share/man/LL.UTF-8 or just /usr/share/man/LL. Pros for LL.UTF-8:
>
> * Non-compliant implementations (I'm guessing xman, yelp, etc.) will
> display English manual pages rather than misencoded garbage. This
> might not be such a big deal for European languages, but for e.g.
> Japanese I suspect most people would prefer English to the spew you
> get by trying to interpret UTF-8 as EUC-JP.
I'd rather fix the other implementations, frankly. All of Debian is
moving towards UTF-8, as is all of the rest of the Linux world, and I'd
rather not leave transitional measures around forever.
> * Determining progress towards universal UTF-8 encoding can trivially
> be done by scanning Contents files rather than having to unpack the
> archive and run iconv over everything.
Yeah, but we already have an unpacked version of the archive available in
the lintian lab, so doing this isn't too bad.
> * In the event that we later want to migrate to yet another
> "universal" encoding that can't be automatically distinguished from
> UTF-8, we already have the encoding name right there and migration
> will be straightforward. (I think this is an unlikely scenario.)
Yes, this seems extremely unlikely to me. UTF-8 isn't perfect, but it
seems to have reached the "good enough" level that people will work around
its flaws rather than replace it with something else.
> I think I am increasingly leaning towards just using /usr/share/man/LL,
> seeing as man has to try decoding pages there as UTF-8 first anyway, but
> please comment if you care.
I agree with this position.
> Unfortunately 2.5.0 wasn't quite enough. Aside from a couple of stupid
> bugs (mostly fixed now), it turns out that we need an extra feature to
> allow debhelper to produce UTF-8 versions of manual pages without
> needing the source encoding to be explicitly specified, by guessing the
> encoding in the same way that man does:
>
> http://lists.debian.org/debian-i18n/2007/10/msg00063.html
>
> I committed this feature to my development trunk earlier today, and will
> be working on a 2.5.1 release over the next couple of weeks. After that
> I'll send Joey a patch for debhelper.
It sounds like the same feature could be used by other man implementations
that currently can't deal with UTF-8.
The transition plan looks good to me.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: