[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Please keep an eye on manpage encoding issues (especially in etch).



Hi debian-i18n members,

Manpage encoding issues are seen for some packages and for some
languages; some manpages are encoded in UTF-8 and unreadable in any
environment.

Since ancient manpages do not have encoding information in itself,
each parent directory of translated manpages has a default input
manpage encoding, which is defined in src/encodings.c of the man-db
program (e.g. Japanese manpages under /usr/share/man/ja are all
assumed to be EUC-JP-encoded files).  In the past, no one violated it
and it worked well, but these days, automatically generation of
manpages from other format and trend toward UTF-8 sometimes break it.

I've found following four bug reports on wrongly encoded manpages,
which affect upcoming etch release:

Bug#391061 (aptitude):
 Japanese, due to DocBook XSL, open in testing (0.4.3-1),
 fixed in 0.4.4-1 (unstable).
Bug#391699 (apt):
 Japanese, due to DocBook XSL, open,
 patch available but local regeneration before package rebuild required.
Bug#395503 (manpage-es):
 Spanish, open.
Bug#397953 (debhelper):
 French, open.

I'm afraid more packages are involved in this issue.

Since the release of etch is closing in and I think unreadable
manpages are important and easy-to-fix bugs, I'd like to squash this
issue.

For Japanese manpages, Junichi Uekawa made sure only manpages in apt
and aptitude were affected by following procedures[1]:

(1) Extract /usr/share/man/ja from Contents-i386.gz and expand
    corresponding packages.

(2) Run a following one-liner and find manpages that cause error:
    for A in $(find usr/share/man/ja/ ) ;do echo ---------$A ; \
      zcat $A | iconv -f euc-jp -t euc-jp > /dev/null; done 

This procedure should be available for other encoding-directory pairs.
Since I cannot get information about encoding-directory relationships
for other languages without reading src/encodings.c, I am happy if you
check your familiar languages.

I also propose that we should describe encoding-directory
relationships of manpages in some documentation in the future
(post-etch) and check them with package checkers (lintian, linda, and
piuparts) to make sure package maintainers will check this issue.

[1] http://lists.debian.or.jp/debian-devel/200610/msg00010.html (in Japanese)

Regards,

-nori



Reply to: