[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#440420: [PROPOSAL] Manual page encoding



Colin Watson <cjwatson@debian.org> writes:

> Right. Here's an update; I think I've captured most of the discussion in
> the thread so far. The following patch could in principle be applied
> now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> rather dense recommendation; I'm also looking for seconds for this
> proposal.

This proposal and patch looks good to me, although I'd prefer to see a few
more seconds before I queue it up for applying to Policy 3.7.4.

> I'm still open to whether new-world-order pages should go in
> /usr/share/man/LL.UTF-8 or just /usr/share/man/LL. Pros for LL.UTF-8:
>
>   * Non-compliant implementations (I'm guessing xman, yelp, etc.) will
>     display English manual pages rather than misencoded garbage. This
>     might not be such a big deal for European languages, but for e.g.
>     Japanese I suspect most people would prefer English to the spew you
>     get by trying to interpret UTF-8 as EUC-JP.

I'd rather fix the other implementations, frankly.  All of Debian is
moving towards UTF-8, as is all of the rest of the Linux world, and I'd
rather not leave transitional measures around forever.

>   * Determining progress towards universal UTF-8 encoding can trivially
>     be done by scanning Contents files rather than having to unpack the
>     archive and run iconv over everything.

Yeah, but we already have an unpacked version of the archive available in
the lintian lab, so doing this isn't too bad.

>   * In the event that we later want to migrate to yet another
>     "universal" encoding that can't be automatically distinguished from
>     UTF-8, we already have the encoding name right there and migration
>     will be straightforward. (I think this is an unlikely scenario.)

Yes, this seems extremely unlikely to me.  UTF-8 isn't perfect, but it
seems to have reached the "good enough" level that people will work around
its flaws rather than replace it with something else.

> I think I am increasingly leaning towards just using /usr/share/man/LL,
> seeing as man has to try decoding pages there as UTF-8 first anyway, but
> please comment if you care.

I agree with this position.

> Unfortunately 2.5.0 wasn't quite enough. Aside from a couple of stupid
> bugs (mostly fixed now), it turns out that we need an extra feature to
> allow debhelper to produce UTF-8 versions of manual pages without
> needing the source encoding to be explicitly specified, by guessing the
> encoding in the same way that man does:
>
>   http://lists.debian.org/debian-i18n/2007/10/msg00063.html
>
> I committed this feature to my development trunk earlier today, and will
> be working on a 2.5.1 release over the next couple of weeks. After that
> I'll send Joey a patch for debhelper.

It sounds like the same feature could be used by other man implementations
that currently can't deal with UTF-8.

The transition plan looks good to me.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: