[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

seconded plus remarks on HTML encoding Re: [PROPOSAL]: encourage use of utf-8 in documentation and clarify encoding issues



Hi

Seconded

Though I think that a charset must be given in HTML if it is not
ASCII, or (if the DTD is actually given) latin1 respectively
UTF-8. A quick search on w3c.org reveils:

### http://www.w3.org/International/O-HTML-charset.html
#
# The base character set or document character set of HTML 4.0 and
# XML is ISO/IEC 10646 (aka. Unicode, the Universal Character
# Set).  This does not mean that all HTML and XML documents have
# to be encoded in Unicode (and there would still be various
# encodings to choose from, such as e.g. UTF-8 and UTF-16). But it
# means that the logical model describing how HTML and XML are
# processed is described in terms of the UCS. The most important
# consequence is that numeric character references (&#dddd; and
# &#xhhhh;) are interpreted as Unicode.
# 
# HTML 2.0 defined that all characters in an HTML document are to
# be interpreted relative to ISO 8859-1 (aka. ISO Latin 1), but
# also announced that all future versions of HTML will use a
# superset of that, viz.  Unicode (or ISO 10646), which means that
# some 34000 of the world's characters are available (provided the
# software that reads/writes HTML is upgraded).

This whole stuff is such confusing that IMO everything but pure
ASCII must have a charset meta.

ciao, 2ri

Radovan Garabik schrieb:
> --- policy.sgml-old	Fri Jun  1 11:40:16 2001
> +++ policy.sgml	Thu Jun  7 13:31:09 2001
> @@ -1653,6 +1653,15 @@
>  
>  
>        </sect>
> +
> +      <sect id="controlencoding"><heading>Encoding of control files</heading>
> +	<p>
> +            If, for whatever reason (such as upstream author's or maintainer's
> +            names, foreign language package description and similar), you need to
> +            use characters outside 7 bit ASCII range in control files, these
> +            characters should be encoded using UTF-8 encoding.
> +	</p>
> +      </sect>
>      </chapt>
>  
>      <chapt id="versions"><heading>Version numbering</heading>
> @@ -2276,8 +2285,16 @@
>  	    all.
>  	  </p>
>  	</sect1>
> +
> +	<sect1><heading>Character set of <tt>debian/changelog</tt></heading>
> +
> +	  <p>
> +            Character set of <tt>debian/changelog</tt> should be either pure ASCII, or UTF-8.
> +	  </p>
> +	</sect1>
>        </sect>
>  
> +
>        <sect id="srcsubstvars"><heading><tt>debian/substvars</tt>
>  	  and variable substitutions	  </heading>
>  
> @@ -7370,6 +7387,26 @@
>  	  from <tt>/usr/share/doc/<var>package</var>/</tt>.
>  	</p>
>  
> +	<p>
> +          Documentation of debian packages in text format, if written in
> +          language requiring characters outside of 7-bit ASCII range,
> +          should use either well-established encoding for the given
> +          language <footnote>such as ISO-8859-2 for some central- and easter 
> +          europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8 
> +          encoding.
> +          Maintainers are being encouraged to use UTF-8, having in mind
> +          the general debian migration toward unified character encoding.
> +	</p>
> +
> +	<p>
> +          Original upstream documentation, if in encoding other than UTF-8
> +          or the well-established encoding for the particular language,
> +          should be converted either to UTF-8 or to the well-established
> +          encoding. Choice between UTF-8 and other encoding is left to the
> +          maintainer discretion, however, one package should have all the
> +          documentation in one consistent encoding for one language.
> +	</p>
> +        
>        </sect>
>  
>        <sect id="usrdoc">
> @@ -7440,6 +7477,18 @@
>  	  Other formats such as PostScript may be provided at the
>  	  package maintainer's discretion.
>  	</p>
> +
> +        <p>
> +          HTML documents, if in encoding other than <tt>us-ascii</tt>, should
> +          have in their header an appropriate META tag describing 
> +          the used encoding.
> +          
> +          Example:
> +          <example>
> +            &lt;META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=UTF-8"&gt;
> +          </example>
> +        </p>
> +        
>        </sect>
>  
>        <sect id="copyrightfile">
> @@ -7555,6 +7604,24 @@
>  	  changelog, then the Debian changelog should still be called
>  	  <tt>changelog.Debian.gz</tt>.</p>
>        </sect>
> +
> +      <sect id="charset">
> +	<heading>Deafult character set</heading>
> +
> +	<p>
> +          Names of maintainers, upstream authors and other data in
> +          packages' descriptions and related debian data files (such as
> +          <tt>debian/changelog</tt>, <tt>debian/copyright</tt>, 
> +          <tt>debian/control</tt>), as well as in English language 
> +          documentation, should be either transliterated or 
> +          transcribed to ASCII, or used in UTF-8 encoding at the 
> +          discretion of the maintainer. However, for names
> +          in scripts based on non-latin alphabets, ASCII (or suitable
> +          latin-script) version should be provided along with original
> +          name.
> +        </p>
> +       </sect>
> +
>      </chapt>
>  
>      <appendix id="pkg-scope">
> 
> 
> 
> --  
> To UNSUBSCRIBE, email to debian-policy-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

-- 
All constants are variables.



Reply to: