[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99933: [PROPOSAL]: encourage use of utf-8 in documentation and clarify encoding issues

Package: debian-policy
Severity: wishlist

Following proposed addition to policy clarifies encoding issues and
prepares for eventual later migration to utf-8 (see Bug#99324).
Note the use of word "should" - these are not strict requirements.

--- policy.sgml-old	Fri Jun  1 11:40:16 2001
+++ policy.sgml	Thu Jun  7 13:31:09 2001
@@ -1653,6 +1653,15 @@
+      <sect id="controlencoding"><heading>Encoding of control files</heading>
+	<p>
+            If, for whatever reason (such as upstream author's or maintainer's
+            names, foreign language package description and similar), you need to
+            use characters outside 7 bit ASCII range in control files, these
+            characters should be encoded using UTF-8 encoding.
+	</p>
+      </sect>
     <chapt id="versions"><heading>Version numbering</heading>
@@ -2276,8 +2285,16 @@
+	<sect1><heading>Character set of <tt>debian/changelog</tt></heading>
+	  <p>
+            Character set of <tt>debian/changelog</tt> should be either pure ASCII, or UTF-8.
+	  </p>
+	</sect1>
       <sect id="srcsubstvars"><heading><tt>debian/substvars</tt>
 	  and variable substitutions	  </heading>
@@ -7370,6 +7387,26 @@
 	  from <tt>/usr/share/doc/<var>package</var>/</tt>.
+	<p>
+          Documentation of debian packages in text format, if written in
+          language requiring characters outside of 7-bit ASCII range,
+          should use either well-established encoding for the given
+          language <footnote>such as ISO-8859-2 for some central- and easter 
+          europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8 
+          encoding.
+          Maintainers are being encouraged to use UTF-8, having in mind
+          the general debian migration toward unified character encoding.
+	</p>
+	<p>
+          Original upstream documentation, if in encoding other than UTF-8
+          or the well-established encoding for the particular language,
+          should be converted either to UTF-8 or to the well-established
+          encoding. Choice between UTF-8 and other encoding is left to the
+          maintainer discretion, however, one package should have all the
+          documentation in one consistent encoding for one language.
+	</p>
       <sect id="usrdoc">
@@ -7440,6 +7477,18 @@
 	  Other formats such as PostScript may be provided at the
 	  package maintainer's discretion.
+        <p>
+          HTML documents, if in encoding other than <tt>us-ascii</tt>, should
+          have in their header an appropriate META tag describing 
+          the used encoding.
+          Example:
+          <example>
+            &lt;META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=UTF-8"&gt;
+          </example>
+        </p>
       <sect id="copyrightfile">
@@ -7555,6 +7604,24 @@
 	  changelog, then the Debian changelog should still be called
+      <sect id="charset">
+	<heading>Deafult character set</heading>
+	<p>
+          Names of maintainers, upstream authors and other data in
+          packages' descriptions and related debian data files (such as
+          <tt>debian/changelog</tt>, <tt>debian/copyright</tt>, 
+          <tt>debian/control</tt>), as well as in English language 
+          documentation, should be either transliterated or 
+          transcribed to ASCII, or used in UTF-8 encoding at the 
+          discretion of the maintainer. However, for names
+          in scripts based on non-latin alphabets, ASCII (or suitable
+          latin-script) version should be provided along with original
+          name.
+        </p>
+       </sect>
     <appendix id="pkg-scope">

Reply to: