Re: charsets in debian/control
On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote:
> On that note, how likely is it to hit a UTF-8 character encoding that
> contains a '\n'? Any non UTF-8 aware parser would assume a new line
> has started and get parse errors.
0% likely, guaranteed.
UTF-8 is *designed* to be upwards compatible with plain ASCII. Every
valid ASCII character has the same meaning in UTF-8. Every UTF-8 byte
sequence for a non-ASCII character will not contain *any* ASCII characters.
This is achieved by making sure that everything above plain ASCII has
the high bit set, not just for the first byte, but for all of them.