[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: charsets in debian/control



On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote:
> On that note, how likely is it to hit a UTF-8 character encoding that
> contains a '\n'? Any non UTF-8 aware parser would assume a new line
> has started and get parse errors.

0% likely, guaranteed.

UTF-8 is *designed* to be upwards compatible with plain ASCII. Every
valid ASCII character has the same meaning in UTF-8. Every UTF-8 byte
sequence for a non-ASCII character will not contain *any* ASCII characters.

This is achieved by making sure that everything above plain ASCII has
the high bit set, not just for the first byte, but for all of them.

-- 
Bart.



Reply to: