Bug#174982: [PROPOSAL]: Debian changelogs should be UTF-8 encoded
Support for Unicode, and specifically UTF-8, is steadily increasing
among popular applications in Debian. For example, in unstable, GNOME 2
has excellent support (almost level 2) in almost all its applications;
the big remaining one is gnome-terminal, of which one requires
development versions in order to support UTF-8 (available in Debian
experimental now if you want to play). I think that by the time sarge
is released, UTF-8 support will start to hit critical mass.
I think it is fairly obvious that we need to eventually transition to
UTF-8 for our package infrastructure; it is really the only sane charset
in an international environment. Now, we can't switch to using UTF-8
for package control fields and the like until dpkg has better support,
but one thing we can start doing today is requiring that Debian
changelogs are UTF-8 encoded.
Right now, people are putting whatever random characters they feel like
in Debian changelogs; they might be encoded in ISO-8859-1, BIG5,
ISO-8859-2, ISO-2022-JP, or who knows what. This does come up in the
real world; I use apt-listchanges, and I fairly often see broken
characters in changelogs. The solution is to define the charset of
changelogs as UTF-8. That way, I can read all the changelogs at once
(currently using gnome-terminal) and it will work.
This proposal is a fairly important yet easy to take first step along
the way of transitioning all of Debian to UTF-8.
Attached is a patch against the latest version of policy.
--- policy.sgml~ 2002-11-15 01:49:40.000000000 -0500
+++ policy.sgml 2003-01-01 21:59:26.000000000 -0500
@@ -2257,6 +2257,13 @@
separated by exactly two spaces.
+ The entire changelog must be encoded in the
+ <url id="http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html" name="UTF-8">
+ encoding of
+ <url id="http://www.unicode.org/" name="Unicode">.
<sect1><heading>Defining alternative changelog formats</heading>