We seem to be moving to a de facto standard of UTF-8 for non-ASCII
characters in debian/control files. This is not specified in Policy
[1], but for hopefully obvious reasons, consistency is a Good Thing,
and UTF-8 seems to be the best solution for this sort of thing.
In my sid control files, I see 841 lines with non-ASCII characters,
mostly (761 lines) in Maintainer and Uploaders fields:
perl -ne 'print if m/[\x80-\xff]/' /var/lib/apt/lists/* | wc -l
Of these, 747 lines are UTF-8 and 94 lines are not.[2]
I hate to suggest a mass bug filing (33 source packages), since it's a
mere de facto standard. And I'm certainly not in the mood to campaign
for a Policy amendment. But it would be a Good Thing to aim for
consistency here. Current UI tools (dpkg, dselect, apt-cache,
aptitude) seem to know nothing about character sets, and just pass
characters verbatim to the terminal, but one can easily imagine a tool
that would convert to a user's local character set when possible.
I suggest that the affected source packages[3] be run through the
command 'iconv -f ORIGINAL_CHARSET -t utf-8' as soon as convenient.
Would people support a mass bug at minor severity?
Peter
[1] Note that UTF-8 *is* recommended for debian/changelog.
http://www.debian.org/doc/debian-policy/ap-pkg-sourcepkg.html#s-pkg-dpkgchangelog
[2] It is easy to tell if text is UTF-8 or not; I use the exit status
of "iconv -f utf-8 -t utf-8". This gives very few false positives,
because UTF-8 has a very strict format.
[3] abcm2ps freecraft maint-guide
ap-utils gl-117 movixmaker-2
appunti-informatica-libera glade-perl mozilla-locale-hu
ayuda gnustep-icons myspell-sv
boa gridlock ntfsdoc
boa-constructor gtkdiskfree pdftohtml
bombermaze gtodo pdp
bonsai iris pyca
cadubi itcl3 pyro
cantus kernel-patch-2.4.26-s390 pythoncad
coq-doc kernel-patch-2.4.27-s390 rat
crafted krb4 strategoxt
darkstat lg-issue46 sympa
ddclient libcgi-validate-perl syslog-ng
doc-linux-html-pt libconfig-general-perl tuxeyes
doc-linux-text-pt libexporter-lite-perl unac
drpython libtext-unaccent-perl wmblob
elmo libuniversal-exports-perl wmnetmon
fcmp linux-ntfs wordtrans
fortunes-fr linux-tutorial-es wprint
fortunes-it
Attachment:
signature.asc
Description: Digital signature