Bug#99324: Default charset should be UTF-8
On Fri, Jun 01, 2001 at 06:08:43PM -0500, Steve Greenland wrote:
> At present cron parses the command simply by reading everything up
> to the end of the line ('\n'), char by char (in the C type sense of
> 'char'). Is there a guarantee that byte value representing '\n' won't
> show up in the sequence?
Unicode is a compatible extension of ASCII. Markus Kuhn's UTF-8 and
Unicode FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html) describes this
aspect of UTF-8 fairly well:
* UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes
0x00 to 0x7F (ASCII compatibility). This means that files and strings
which contain only 7-bit ASCII characters have the same encoding under
both ASCII and UTF-8.
* All UCS characters >U+007F are encoded as a sequence of several bytes,
each of which has the most significant bit set. Therefore, no ASCII
byte (0x00-0x7F) can appear as part of any other character.
* The bytes 0xFE and 0xFF are never used in the UTF-8 encoding.