[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99324: Default charset should be UTF-8



On Fri, Jun 01, 2001 at 06:08:43PM -0500, Steve Greenland wrote:
> At present cron parses the command simply by reading everything up
> to the end of the line ('\n'), char by char (in the C type sense of
> 'char'). Is there a guarantee that byte value representing '\n' won't
> show up in the sequence?

Unicode is a compatible extension of ASCII.  Markus Kuhn's UTF-8 and
Unicode FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html) describes this
aspect of UTF-8 fairly well:

     * UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes
       0x00 to 0x7F (ASCII compatibility). This means that files and strings
       which contain only 7-bit ASCII characters have the same encoding under
       both ASCII and UTF-8.
     * All UCS characters >U+007F are encoded as a sequence of several bytes,
       each of which has the most significant bit set. Therefore, no ASCII
       byte (0x00-0x7F) can appear as part of any other character.
...
     * The bytes 0xFE and 0xFF are never used in the UTF-8 encoding.

-- 
Raul



Reply to: