[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

charsets in debian/control



We seem to be moving to a de facto standard of UTF-8 for non-ASCII
characters in debian/control files.  This is not specified in Policy
[1], but for hopefully obvious reasons, consistency is a Good Thing,
and UTF-8 seems to be the best solution for this sort of thing.

In my sid control files, I see 841 lines with non-ASCII characters,
mostly (761 lines) in Maintainer and Uploaders fields:

  perl -ne 'print if m/[\x80-\xff]/' /var/lib/apt/lists/* | wc -l

Of these, 747 lines are UTF-8 and 94 lines are not.[2]

I hate to suggest a mass bug filing (33 source packages), since it's a
mere de facto standard.  And I'm certainly not in the mood to campaign
for a Policy amendment.  But it would be a Good Thing to aim for
consistency here.  Current UI tools (dpkg, dselect, apt-cache,
aptitude) seem to know nothing about character sets, and just pass
characters verbatim to the terminal, but one can easily imagine a tool
that would convert to a user's local character set when possible.

I suggest that the affected source packages[3] be run through the
command 'iconv -f ORIGINAL_CHARSET -t utf-8' as soon as convenient.
Would people support a mass bug at minor severity?

Peter

[1] Note that UTF-8 *is* recommended for debian/changelog.
    http://www.debian.org/doc/debian-policy/ap-pkg-sourcepkg.html#s-pkg-dpkgchangelog

[2] It is easy to tell if text is UTF-8 or not; I use the exit status
    of "iconv -f utf-8 -t utf-8".  This gives very few false positives,
    because UTF-8 has a very strict format.

[3] abcm2ps                     freecraft                   maint-guide
    ap-utils                    gl-117                      movixmaker-2
    appunti-informatica-libera  glade-perl                  mozilla-locale-hu
    ayuda                       gnustep-icons               myspell-sv
    boa                         gridlock                    ntfsdoc
    boa-constructor             gtkdiskfree                 pdftohtml
    bombermaze                  gtodo                       pdp
    bonsai                      iris                        pyca
    cadubi                      itcl3                       pyro
    cantus                      kernel-patch-2.4.26-s390    pythoncad
    coq-doc                     kernel-patch-2.4.27-s390    rat
    crafted                     krb4                        strategoxt
    darkstat                    lg-issue46                  sympa
    ddclient                    libcgi-validate-perl        syslog-ng
    doc-linux-html-pt           libconfig-general-perl      tuxeyes
    doc-linux-text-pt           libexporter-lite-perl       unac
    drpython                    libtext-unaccent-perl       wmblob
    elmo                        libuniversal-exports-perl   wmnetmon
    fcmp                        linux-ntfs                  wordtrans
    fortunes-fr                 linux-tutorial-es           wprint
    fortunes-it

Attachment: signature.asc
Description: Digital signature


Reply to: