seconded plus remarks on HTML encoding Re: [PROPOSAL]: encourage use of utf-8 in documentation and clarify encoding issues
Hi
Seconded
Though I think that a charset must be given in HTML if it is not
ASCII, or (if the DTD is actually given) latin1 respectively
UTF-8. A quick search on w3c.org reveils:
### http://www.w3.org/International/O-HTML-charset.html
#
# The base character set or document character set of HTML 4.0 and
# XML is ISO/IEC 10646 (aka. Unicode, the Universal Character
# Set). This does not mean that all HTML and XML documents have
# to be encoded in Unicode (and there would still be various
# encodings to choose from, such as e.g. UTF-8 and UTF-16). But it
# means that the logical model describing how HTML and XML are
# processed is described in terms of the UCS. The most important
# consequence is that numeric character references (&#dddd; and
# &#xhhhh;) are interpreted as Unicode.
#
# HTML 2.0 defined that all characters in an HTML document are to
# be interpreted relative to ISO 8859-1 (aka. ISO Latin 1), but
# also announced that all future versions of HTML will use a
# superset of that, viz. Unicode (or ISO 10646), which means that
# some 34000 of the world's characters are available (provided the
# software that reads/writes HTML is upgraded).
This whole stuff is such confusing that IMO everything but pure
ASCII must have a charset meta.
ciao, 2ri
Radovan Garabik schrieb:
> --- policy.sgml-old Fri Jun 1 11:40:16 2001
> +++ policy.sgml Thu Jun 7 13:31:09 2001
> @@ -1653,6 +1653,15 @@
>
>
> </sect>
> +
> + <sect id="controlencoding"><heading>Encoding of control files</heading>
> + <p>
> + If, for whatever reason (such as upstream author's or maintainer's
> + names, foreign language package description and similar), you need to
> + use characters outside 7 bit ASCII range in control files, these
> + characters should be encoded using UTF-8 encoding.
> + </p>
> + </sect>
> </chapt>
>
> <chapt id="versions"><heading>Version numbering</heading>
> @@ -2276,8 +2285,16 @@
> all.
> </p>
> </sect1>
> +
> + <sect1><heading>Character set of <tt>debian/changelog</tt></heading>
> +
> + <p>
> + Character set of <tt>debian/changelog</tt> should be either pure ASCII, or UTF-8.
> + </p>
> + </sect1>
> </sect>
>
> +
> <sect id="srcsubstvars"><heading><tt>debian/substvars</tt>
> and variable substitutions </heading>
>
> @@ -7370,6 +7387,26 @@
> from <tt>/usr/share/doc/<var>package</var>/</tt>.
> </p>
>
> + <p>
> + Documentation of debian packages in text format, if written in
> + language requiring characters outside of 7-bit ASCII range,
> + should use either well-established encoding for the given
> + language <footnote>such as ISO-8859-2 for some central- and easter
> + europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8
> + encoding.
> + Maintainers are being encouraged to use UTF-8, having in mind
> + the general debian migration toward unified character encoding.
> + </p>
> +
> + <p>
> + Original upstream documentation, if in encoding other than UTF-8
> + or the well-established encoding for the particular language,
> + should be converted either to UTF-8 or to the well-established
> + encoding. Choice between UTF-8 and other encoding is left to the
> + maintainer discretion, however, one package should have all the
> + documentation in one consistent encoding for one language.
> + </p>
> +
> </sect>
>
> <sect id="usrdoc">
> @@ -7440,6 +7477,18 @@
> Other formats such as PostScript may be provided at the
> package maintainer's discretion.
> </p>
> +
> + <p>
> + HTML documents, if in encoding other than <tt>us-ascii</tt>, should
> + have in their header an appropriate META tag describing
> + the used encoding.
> +
> + Example:
> + <example>
> + <META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
> + </example>
> + </p>
> +
> </sect>
>
> <sect id="copyrightfile">
> @@ -7555,6 +7604,24 @@
> changelog, then the Debian changelog should still be called
> <tt>changelog.Debian.gz</tt>.</p>
> </sect>
> +
> + <sect id="charset">
> + <heading>Deafult character set</heading>
> +
> + <p>
> + Names of maintainers, upstream authors and other data in
> + packages' descriptions and related debian data files (such as
> + <tt>debian/changelog</tt>, <tt>debian/copyright</tt>,
> + <tt>debian/control</tt>), as well as in English language
> + documentation, should be either transliterated or
> + transcribed to ASCII, or used in UTF-8 encoding at the
> + discretion of the maintainer. However, for names
> + in scripts based on non-latin alphabets, ASCII (or suitable
> + latin-script) version should be provided along with original
> + name.
> + </p>
> + </sect>
> +
> </chapt>
>
> <appendix id="pkg-scope">
>
>
>
> --
> To UNSUBSCRIBE, email to debian-policy-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
--
All constants are variables.
Reply to: