[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#174982: [PROPOSAL]: Debian changelogs should be UTF-8 encoded



[ No need to CC me; I am subscribed to -policy ]

On Thu, 2003-01-02 at 00:23, David B Harris wrote:

> Could you provide a quick background about what Unicode is

Sure.  Essentially Unicode is a universal character set, used to encode
all the world's languages, plus other symbols from mathematics and the
like.  It is intended to supplant the other national charsets like
US-ASCII, ISO-8859-1 and BIG5 which are specific to the United States,
Western Europe, and China, respectively.  Unicode makes
internationalization and multilingualization much easier.

> and how it
> co-operates with 7-bit ASCII? 

The UTF-8 encoding of Unicode (translation from code point number into
sequence of bytes) is completely backwards compatible with US-ASCII, and
moreover no ASCII character appears as part of a multibyte character,
which makes it filesystem safe, for example.

See the URL I gave in the patch for more information:
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html

I hope that helps!





Reply to: