[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie



Adam Borowski writes ("UTF-8 in jessie"):
> I would like to propose full UTF-8 support.  I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.

I agree with everything you propose except that I have one reservation
regarding this:

> 4. all text files should be encoded in UTF-8

I agree with this except that I think it should be permitted that a
text file uses ASCII codepoints.

You may say "but UTF-8 is a superset of ASCII".  Well, no, it isn't.
UTF-8 is a superset of ISO-646 but ISO-646 is not identical to ASCII.
In particular the descriptions of the codepoints ` ' in ISO-646
effectively forbids them from being used as matching single quotes,
despite that being specified as allowed in ASCII.

I don't think that better UTF-8 support should involve needlessly
converting 7-bit ASCII text files which use ` ' as matched quotes,
into UTF-8 text files which use non-ISO-646 codepoints.

(In fact I would like to see Markus Kuhn's decision about ` ' reversed
- our default character set should be ASCII for 0..127 plus UTF for
the rest.  That's not an argument I expect to win but at the very
least we shouldn't have to worsify things for ASCII users.)

Thanks,
Ian.


Reply to: