[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: charsets in debian/control



Bart Schuller <schuller@lunatech.com> writes:

> On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote:
>> On that note, how likely is it to hit a UTF-8 character encoding that
>> contains a '\n'? Any non UTF-8 aware parser would assume a new line
>> has started and get parse errors.
>
> 0% likely, guaranteed.
>
> UTF-8 is *designed* to be upwards compatible with plain ASCII. Every
> valid ASCII character has the same meaning in UTF-8. Every UTF-8 byte
> sequence for a non-ASCII character will not contain *any* ASCII characters.
>
> This is achieved by making sure that everything above plain ASCII has
> the high bit set, not just for the first byte, but for all of them.

Ok, so no problems there. Any parser that acceps 8bit non-ascii chars
will accept UTF-8 then. What remains is just making the UTF-8 chars
visually correct then.

MfG
        Goswin



Reply to: