Bug#241333: Mandate UTF-8 for changelog files
Russ Allbery <rra@debian.org> writes:
> The reason why I dropped the RFC reference is that there are multiple
> references to UTF-8 all over Policy these days and I don't really want
> to footnote all of them. I'm not sure the best way to handle this.
> Maybe we need some sort of introductory mention of UTF-8 somewhere?
Here's a revised patch that adds a definition section, currently only
defining ASCII and UTF-8. I haven't included the iconv trick; I think
that the Developers Reference or Lintian are better places for that. But
I don't feel strongly that way and am willing to change my mind of people
disagree.
diff --git a/policy.sgml b/policy.sgml
index 24c9072..219664d 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -273,6 +273,32 @@
</p>
</sect>
+ <sect id="definitions">
+ <heading>Definitions</heading>
+
+ <p>
+ The following terms are used in this Policy Manual:
+ <taglist>
+ <tag>ASCII</tag>
+ <item>
+ The character encoding specified by ANSI X3.4-1986 and its
+ predecessor standards, referred to in MIME as US-ASCII, and
+ corresponding to an encoding in eight bits per character of
+ the first 128 <url id="http://www.unicode.org/"
+ name="Unicode"> characters, with the eighth bit always zero.
+ </item>
+ <tag>UTF-8</tag>
+ <item>
+ The transformation format (sometimes called encoding) of
+ <url id="http://www.unicode.org/" name="Unicode"> defined by
+ <url id="http://www.rfc-editor.org/rfc/rfc3629.txt"
+ name="RFC 3629">. UTF-8 has the useful property of having
+ ASCII as a subset, so any text encoded in ASCII is trivially
+ also valid UTF-8.
+ </item>
+ </taglist>
+ </p>
+ </sect>
</chapt>
@@ -1473,10 +1499,6 @@
</p>
<p>
-
- </p>
-
- <p>
The format of the <file>debian/changelog</file> allows the
package building tools to discover which version of the package
is being built and find out other release-specific information.
@@ -1582,6 +1604,10 @@
</p>
<p>
+ The entire changelog must be encoded in UTF-8.
+ </p>
+
+ <p>
For more information on placement of the changelog files
within binary packages, please see <ref id="changelogs">.
</p>
@@ -9822,36 +9848,6 @@ install-info --quiet --remove /usr/share/info/foobar.info
See <ref id="dpkgchangelog">.
</p>
- <p>
- It is recommended that the entire changelog be encoded in the
- <url id="http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html" name="UTF-8">
- encoding of
- <url id="http://www.unicode.org/"
- name="Unicode">.<footnote>
- <p>
- I think it is fairly obvious that we need to
- eventually transition to UTF-8 for our package
- infrastructure; it is really the only sane char-set in
- an international environment. Now, we can't switch to
- using UTF-8 for package control fields and the like
- until dpkg has better support, but one thing we can
- start doing today is requesting that Debian changelogs
- are UTF-8 encoded. At some point in time, we can start
- requiring them to do so.
- </p>
- <p>
- Checking for non-UTF8 characters in a changelog is
- trivial. Dump the file through
- <example>iconv -f utf-8 -t ucs-4</example>
- discard the output, and check the return
- value. If there are any characters in the stream
- which are invalid UTF-8 sequences, iconv will exit
- with an error code; and this will be the case for the
- vast majority of other character sets.
- </p>
- </footnote>
- </p>
-
<sect2><heading>Defining alternative changelog formats
</heading>
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: