[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie



On Wed, Aug 28, 2013 at 04:20:17PM +0100, Ian Jackson wrote:
> Adam Borowski writes ("UTF-8 in jessie"):
> > I would like to propose full UTF-8 support.  I don't mean here full
> > support for all of Unicode's finer points, merely complete eradication of
> > mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> > as equal to "a""combining ¨" is out of scope of this proposal.
> 
> I agree with everything you propose except that I have one reservation
> regarding this:
> 
> > 4. all text files should be encoded in UTF-8
> 
> I agree with this except that I think it should be permitted that a
> text file uses ASCII codepoints.
> 
> You may say "but UTF-8 is a superset of ASCII".  Well, no, it isn't.

Uhm, how?

> UTF-8 is a superset of ISO-646 but ISO-646 is not identical to ASCII.
> In particular the descriptions of the codepoints ` ' in ISO-646
> effectively forbids them from being used as matching single quotes,
> despite that being specified as allowed in ASCII.

Let's take a look at some sheets.

Feb 1972:
https://en.wikipedia.org/wiki/File:ASCII_Code_Chart-Quick_ref_card.jpg

1967/68:
http://www.samhallas.co.uk/repository/telegraph/teletype_33_specs.pdf

` and ' don't look like anything resembling matching quotes to me.
Usually ' is vertical or slightly slanted, ` tends to be at 45 degrees
or quite close to horizontal.

> I don't think that better UTF-8 support should involve needlessly
> converting 7-bit ASCII text files which use ` ' as matched quotes,
> into UTF-8 text files which use non-ISO-646 codepoints.

These code points are defined to be exactly the same in both ASCII and
Unicode.  Only fonts may differ.  And like Han unification issues, this
is out of scope here.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


Reply to: