Bug#99933: second attempt at more comprehensive unicode policy
On Thu, 2003-01-02 at 13:57, Colin Walters wrote:
> #99933 goes a lot farther than #174982.
I have a counter-proposal to #99933, which I have attached. I believe
it fixes the problems I raised with your proposal, and should also cover
some new areas (like filenames). I also hopefully fixed James' issue
with the RFC link.
This patch supplants the one in #174982. It is more ambitious than
#174982, but still does not introduce any "must"s, only "should"s or
weaker.
Opinions?
--- policy.sgml 2003-01-01 21:59:26.000000000 -0500
+++ policy.sgml.new 2003-01-02 17:14:56.000000000 -0500
@@ -2258,10 +2258,8 @@
</p>
<p>
- The entire changelog must be encoded in the
- <url id="http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html" name="UTF-8">
- encoding of
- <url id="http://www.unicode.org/" name="Unicode">.
+ The entire changelog should be encoded UTF-8; see <ref
+ id="unicode"> for more information.
</p>
<sect1><heading>Defining alternative changelog formats</heading>
@@ -4190,6 +4188,31 @@
<sect>
<heading>Filesystem hierarchy</heading>
+ <sect1>
+ <heading>File Names</heading>
+
+ <p>
+ Files included in Debian packages or created by maintainer
+ scripts must have names which are valid UTF-8. Since
+ UTF-8 is fully backwards compatible with ASCII, few
+ packages will encounter trouble with this.
+ </p>
+
+ <p>
+ Programs should expect filenames in general (whether from
+ a Debian package or created by the user) to be encoded
+ with UTF-8, although it is recommended for programs to try
+ gracefully falling back to the current locale's encoding
+ if this fails. Programs included in Debian packages
+ should, when creating new files, encode their names in
+ UTF-8 by default.
+ </p>
+
+ <p>
+ See <ref id="unicode"> for more information on Debian and
+ Unicode.
+ </p>
+ </sect1>
<sect1>
<heading>Filesystem Structure</heading>
@@ -5414,6 +5437,32 @@
</p>
</sect>
+ <sect id="unicode">
+ <heading>Unicode</heading>
+
+ <p>
+ Debian is moving towards
+ <url id="http://www.unicode.org/" name="Unicode">,
+ and specifically the <url id="http://www.ietf.org/rfc/rfc2279.txt" name="UTF-8">
+ encoding of Unicode, for representation of character data.
+ Unicode is a universal character set, able to encode all the
+ world's languages. Using Unicode makes internationalization
+ much easier, since programs will have to deal with only one
+ character set, instead of many different incompatible
+ national variants.
+ </p>
+
+ <p>
+ The UTF-8 encoding of Unicode is designed for Unix-like
+ systems such as Debian. It is fully backwards compatible
+ with US-ASCII, and is also safe for use in filenames, since
+ no ASCII character appears as part of a multibyte character.
+ It is highly recommended, although not yet required, for
+ programs included in Debian to support Unicode and
+ specifically UTF-8.
+ </p>
+ </sect>
+
<sect>
<heading>Environment variables</heading>
@@ -7647,6 +7696,42 @@
</p>
<p>
+ All documentation included in a package should be encoded in
+ UTF-8 (see <ref id="unicode"> for more information). If
+ upstream documentation is in another character set, the data
+ should be converted during the package build process.
+ <footnote>
+ <p>
+ One good way to do this is to use <prgn>iconv</prgn>, like:
+<example>
+ for file in ChangeLog doc/README doc/INSTALL; do
+ iconv -f ISO-8859-1 -t UTF-8 $file > $file.new && mv $file.new $file
+ done
+</example>
+ </p>
+ </footnote>
+ </p>
+
+ <p>
+ Documentation formats which include a standard means of
+ specifying the character set of the data (such as
+ XML's <tt>encoding</tt> tag), may at their option use
+ another character set, although UTF-8 is still preferred.
+ Additionally, it is recommended for document formats which
+ are capable of specifying the character set of their data,
+ and do not have a default (like HTML), to do so.
+ <footnote>
+ <p>
+ As an example, for HTML documents, the <tt>head</tt>
+ section should include a header like:
+<example>
+ <META content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
+</example>
+ </p>
+ </footnote>
+ </p>
+
+ <p>
Other formats such as PostScript may be provided at the
package maintainer's discretion.
</p>
Reply to: