Package: debian-policy Version: 3.6.1.0 Severity: wishlist Tags: patch This proposal aims to use UTF-8 encoding not only for debian/changelog, but also for debian/control. Patch attached, as well as plain text for better reading. Kind regards, Martin
--- debian-policy-3.6.1.0.orig/policy.sgml Tue Aug 19 14:32:23 2003 +++ debian-policy-3.6.1.0/policy.sgml Sun Aug 31 13:30:14 2003 @@ -2250,6 +2250,12 @@ See <ref id="substvars"> for details. </p> + <p> + It is recommended that the control fields be encoded in + UTF-8 encoding, see <ref id="pkg-dpkgchangelog"> for + further information on this. + </p> + </sect> <sect id="binarycontrolfiles">
5.2. Source package control files -- `debian/control' ----------------------------------------------------- The `debian/control' file contains the most vital (and version-independent) information about the source package and about the binary packages it creates. The first paragraph of the control file contains information about the source package in general. The subsequent sets each describe a binary package that the source tree builds. The fields in the general paragraph (the first one, for the source package) are: * `Source' (mandatory) * `Maintainer' (mandatory) * `Section' (recommended) * `Priority' (recommended) * `Build-Depends' et al * `Standards-Version' (recommended) The fields in the binary package paragraphs are: * `Package' (mandatory) * `Architecture' (mandatory) * `Section' (recommended) * `Priority' (recommended) * `Essential' * `Depends' et al * `Description' (mandatory) The syntax and semantics of the fields are described below. These fields are used by `dpkg-gencontrol' to generate control files for binary packages (see below), by `dpkg-genchanges' to generate the `.changes' file to accompany the upload, and by `dpkg-source' when it creates the `.dsc' source control file as part of a source archive. The fields here may contain variable references - their values will be substituted by `dpkg-gencontrol', `dpkg-genchanges' or `dpkg-source' when they generate output control files. See Section 4.9, `Variable substitutions: `debian/substvars'' for details. It is recommended that the control fields be encoded in UTF-8 encoding, see Section C.2.2, ``debian/changelog'' for further information on this. C.2.2. `debian/changelog' ------------------------- See Section 4.4, `Debian changelog: `debian/changelog''. It is recommended that the entire changelog be encoded in the UTF-8 (http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html) encoding of Unicode (http://www.unicode.org/).[1] [1] Support for Unicode, and specifically UTF-8, is steadily increasing among popular applications in Debian. For example, in unstable, GNOME 2 has excellent support (almost level 2) in almost all its applications; the big remaining one is gnome-terminal, of which one requires development versions in order to support UTF-8 (available in Debian experimental now if you want to play). I think that by the time sarge is released, UTF-8 support will start to hit critical mass. I think it is fairly obvious that we need to eventually transition to UTF-8 for our package infrastructure; it is really the only sane charset in an international environment. Now, we can't switch to using UTF-8 for package control fields and the like until dpkg has better support, but one thing we can start doing today is requesting that Debian changelogs are UTF-8 encoded. At some point in time, we can start requiring them to do so. Checking for non-UTF8 characters in a changelog is trivial. Dump the file through iconv -f utf-8 -t ucs-4 discard the output, and check the return value. If there are any characters in the stream which are invalid UTF-8 sequences, iconv will exit with an error code; and this will be the case for the vast majority of other character sets.
Attachment:
pgpxpSZJq_Ynh.pgp
Description: PGP signature