[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please keep an eye on manpage encoding issues (especially in etch).



Thankyou for explaining the situation for some other languages, Nori. :)

On 12/11/2006, at 6:35 PM, Kobayashi Noritada wrote:


Manpage encoding issues are seen for some packages and for some
languages; some manpages are encoded in UTF-8 and unreadable in any
environment.

Can we specify the encoding in the manpage text in some way?

No, the input encoding is determined by the locale and input manpage
path, which is hard-coded in man-db's src/encodings.c.

:(

Users of some languages, like mine, will need UTF8 anyway, and will
set their man conf correspondingly. In any case, UTF8 will be used
increasingly. It is, or should be, the standard encoding for
internationalization.

Yes, exactly I think so.  I hear Fedora have forced manpages to be
encoded in UTF-8 and Debian should do so in the future, or at least
support UTF-8 manpages for languages that have used non-UTF8 ones in
some way (e.g. by detecting input encodings, or by completely
switching to UTF-8).

Yes, indeed. :)

However, completely switching manpage encodings to UTF-8 will require
both modification of man-db's encoding handling and conversion of all
the old non-UTF8 manpages (e.g. EUC-JP ones used for Japanese, and
latin-1 ones used for European languages), which will be a big
transition and will be controversial.  Actually it was controversial
at least in a thread[1] starting from my post in this April in a
Japanese list.

I can see that this will affect some languages which had viable encoding options before UTF8. Users of my language have been greatly motivated to use UTF8 by the fact that our legacy encodings are really clumsy and difficult to use. Plus a complete dearth of translators meaning we have no mainstream translated manpages (and not much of anything else) anyway!

But I know there has been mainstream and ongoing documentation for some other languages. It will be quite a task converting those docs. Maybe, with manpages, this task could be combined with reviewing manpages for currency. I know there is some concern over translated manpages which turn out not to be current, and in some cases are very old indeed.

So, in the first mail, I proposed determining some policy about
encodings of manpages and creating system to check manpage encodings
in the future.  After that, or in parallel, we should do some
transition or support.

Sound good to me. :)

But meanwhile, we need a way to label non-UTF8 manpages, or manpages
which don't match the man conf encoding setting. Bruno, what do you

Yes, a number of non-UTF8 manpages are now used for many languages for
which only non-UTF8 ones are approved.  And at least for etch, we now
need conversion of UTF-8 manpages for those languages.

[1] http://lists.debian.or.jp/debian-users/200604/ threads.html#00078 (in Japanese)

(My Japanese is terrible, but...)

Yes, that's a fairly strong discussion. And dancer has a point, that we have to be careful about changing anything the system, or the software using it, expects to be in a certain format.

But I think we can do it, in time. :)

from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do)
http://groups-beta.google.com/group/vi-VN


Attachment: PGP.sig
Description: This is a digitally signed message part


Reply to: