[PROPOSAL]: encourage use of utf-8 in documentation and clarify encoding issues
submitted to BTS but since master is down, I am posting a copy to
debian-policy as well
Package: debian-policy
Version: 3.5.5.0
Severity: wishlist
Following proposed addition to policy clarifies encoding issues and
prepares for eventual later migration to utf-8 (see Bug#99324).
Note the use of word "should" - these are not strict requirements.
--- policy.sgml-old Fri Jun 1 11:40:16 2001
+++ policy.sgml Thu Jun 7 13:31:09 2001
@@ -1653,6 +1653,15 @@
</sect>
+
+ <sect id="controlencoding"><heading>Encoding of control files</heading>
+ <p>
+ If, for whatever reason (such as upstream author's or maintainer's
+ names, foreign language package description and similar), you need to
+ use characters outside 7 bit ASCII range in control files, these
+ characters should be encoded using UTF-8 encoding.
+ </p>
+ </sect>
</chapt>
<chapt id="versions"><heading>Version numbering</heading>
@@ -2276,8 +2285,16 @@
all.
</p>
</sect1>
+
+ <sect1><heading>Character set of <tt>debian/changelog</tt></heading>
+
+ <p>
+ Character set of <tt>debian/changelog</tt> should be either pure ASCII, or UTF-8.
+ </p>
+ </sect1>
</sect>
+
<sect id="srcsubstvars"><heading><tt>debian/substvars</tt>
and variable substitutions </heading>
@@ -7370,6 +7387,26 @@
from <tt>/usr/share/doc/<var>package</var>/</tt>.
</p>
+ <p>
+ Documentation of debian packages in text format, if written in
+ language requiring characters outside of 7-bit ASCII range,
+ should use either well-established encoding for the given
+ language <footnote>such as ISO-8859-2 for some central- and easter
+ europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8
+ encoding.
+ Maintainers are being encouraged to use UTF-8, having in mind
+ the general debian migration toward unified character encoding.
+ </p>
+
+ <p>
+ Original upstream documentation, if in encoding other than UTF-8
+ or the well-established encoding for the particular language,
+ should be converted either to UTF-8 or to the well-established
+ encoding. Choice between UTF-8 and other encoding is left to the
+ maintainer discretion, however, one package should have all the
+ documentation in one consistent encoding for one language.
+ </p>
+
</sect>
<sect id="usrdoc">
@@ -7440,6 +7477,18 @@
Other formats such as PostScript may be provided at the
package maintainer's discretion.
</p>
+
+ <p>
+ HTML documents, if in encoding other than <tt>us-ascii</tt>, should
+ have in their header an appropriate META tag describing
+ the used encoding.
+
+ Example:
+ <example>
+ <META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
+ </example>
+ </p>
+
</sect>
<sect id="copyrightfile">
@@ -7555,6 +7604,24 @@
changelog, then the Debian changelog should still be called
<tt>changelog.Debian.gz</tt>.</p>
</sect>
+
+ <sect id="charset">
+ <heading>Deafult character set</heading>
+
+ <p>
+ Names of maintainers, upstream authors and other data in
+ packages' descriptions and related debian data files (such as
+ <tt>debian/changelog</tt>, <tt>debian/copyright</tt>,
+ <tt>debian/control</tt>), as well as in English language
+ documentation, should be either transliterated or
+ transcribed to ASCII, or used in UTF-8 encoding at the
+ discretion of the maintainer. However, for names
+ in scripts based on non-latin alphabets, ASCII (or suitable
+ latin-script) version should be provided along with original
+ name.
+ </p>
+ </sect>
+
</chapt>
<appendix id="pkg-scope">
Reply to: