Bug#99933: [PROPOSAL]: encourage use of utf-8 in documentation and clarify encoding issues
Following proposed addition to policy clarifies encoding issues and
prepares for eventual later migration to utf-8 (see Bug#99324).
Note the use of word "should" - these are not strict requirements.
--- policy.sgml-old Fri Jun 1 11:40:16 2001
+++ policy.sgml Thu Jun 7 13:31:09 2001
@@ -1653,6 +1653,15 @@
+ <sect id="controlencoding"><heading>Encoding of control files</heading>
+ If, for whatever reason (such as upstream author's or maintainer's
+ names, foreign language package description and similar), you need to
+ use characters outside 7 bit ASCII range in control files, these
+ characters should be encoded using UTF-8 encoding.
<chapt id="versions"><heading>Version numbering</heading>
@@ -2276,8 +2285,16 @@
+ <sect1><heading>Character set of <tt>debian/changelog</tt></heading>
+ Character set of <tt>debian/changelog</tt> should be either pure ASCII, or UTF-8.
and variable substitutions </heading>
@@ -7370,6 +7387,26 @@
+ Documentation of debian packages in text format, if written in
+ language requiring characters outside of 7-bit ASCII range,
+ should use either well-established encoding for the given
+ language <footnote>such as ISO-8859-2 for some central- and easter
+ europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8
+ Maintainers are being encouraged to use UTF-8, having in mind
+ the general debian migration toward unified character encoding.
+ Original upstream documentation, if in encoding other than UTF-8
+ or the well-established encoding for the particular language,
+ should be converted either to UTF-8 or to the well-established
+ encoding. Choice between UTF-8 and other encoding is left to the
+ maintainer discretion, however, one package should have all the
+ documentation in one consistent encoding for one language.
@@ -7440,6 +7477,18 @@
Other formats such as PostScript may be provided at the
package maintainer's discretion.
+ HTML documents, if in encoding other than <tt>us-ascii</tt>, should
+ have in their header an appropriate META tag describing
+ the used encoding.
+ <META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
@@ -7555,6 +7604,24 @@
changelog, then the Debian changelog should still be called
+ <sect id="charset">
+ <heading>Deafult character set</heading>
+ Names of maintainers, upstream authors and other data in
+ packages' descriptions and related debian data files (such as
+ <tt>debian/changelog</tt>, <tt>debian/copyright</tt>,
+ <tt>debian/control</tt>), as well as in English language
+ documentation, should be either transliterated or
+ transcribed to ASCII, or used in UTF-8 encoding at the
+ discretion of the maintainer. However, for names
+ in scripts based on non-latin alphabets, ASCII (or suitable
+ latin-script) version should be provided along with original