Bug#99324: Default charset should be UTF-8

To: Steve Greenland <stevegr@debian.org>, 99324@bugs.debian.org
Subject: Bug#99324: Default charset should be UTF-8
From: Raul Miller <moth@debian.org>
Date: Fri, 1 Jun 2001 19:30:32 -0400
Message-id: <[🔎] 991438015.080badb9@debian.org>
Reply-to: Raul Miller <moth@debian.org>, 99324@bugs.debian.org
In-reply-to: <[🔎] 20010601180843.A31602@molehole.moregruel.net>; from stevegr@debian.org on Fri, Jun 01, 2001 at 06:08:43PM -0500
References: <[🔎] 991427030.acf56e08@debian.org> <[🔎] 20010601180843.A31602@molehole.moregruel.net>

On Fri, Jun 01, 2001 at 06:08:43PM -0500, Steve Greenland wrote:
> At present cron parses the command simply by reading everything up
> to the end of the line ('\n'), char by char (in the C type sense of
> 'char'). Is there a guarantee that byte value representing '\n' won't
> show up in the sequence?

Unicode is a compatible extension of ASCII.  Markus Kuhn's UTF-8 and
Unicode FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html) describes this
aspect of UTF-8 fairly well:

     * UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes
       0x00 to 0x7F (ASCII compatibility). This means that files and strings
       which contain only 7-bit ASCII characters have the same encoding under
       both ASCII and UTF-8.
     * All UCS characters >U+007F are encoded as a sequence of several bytes,
       each of which has the most significant bit set. Therefore, no ASCII
       byte (0x00-0x7F) can appear as part of any other character.
...
     * The bytes 0xFE and 0xFF are never used in the UTF-8 encoding.

-- 
Raul

Reply to:

References:
- Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>
- Bug#99324: Default charset should be UTF-8
  - From: Steve Greenland <stevegr@debian.org>

Prev by Date: Bug#99324: Default charset should be UTF-8
Next by Date: Re: Bug#99324: Default charset should be UTF-8
Previous by thread: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99324: Default charset should be UTF-8
Index(es):
- Date
- Thread