Re: UTF-8 in jessie
Adam Borowski writes ("UTF-8 in jessie"):
> I would like to propose full UTF-8 support. I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake. That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.
I agree with everything you propose except that I have one reservation
regarding this:
> 4. all text files should be encoded in UTF-8
I agree with this except that I think it should be permitted that a
text file uses ASCII codepoints.
You may say "but UTF-8 is a superset of ASCII". Well, no, it isn't.
UTF-8 is a superset of ISO-646 but ISO-646 is not identical to ASCII.
In particular the descriptions of the codepoints ` ' in ISO-646
effectively forbids them from being used as matching single quotes,
despite that being specified as allowed in ASCII.
I don't think that better UTF-8 support should involve needlessly
converting 7-bit ASCII text files which use ` ' as matched quotes,
into UTF-8 text files which use non-ISO-646 codepoints.
(In fact I would like to see Markus Kuhn's decision about ` ' reversed
- our default character set should be ASCII for 0..127 plus UTF for
the rest. That's not an argument I expect to win but at the very
least we shouldn't have to worsify things for ASCII users.)
Thanks,
Ian.
Reply to: