Re: UTF-8 in jessie

To: debian-devel@lists.debian.org
Subject: Re: UTF-8 in jessie
From: Adam Borowski <kilobyte@angband.pl>
Date: Wed, 28 Aug 2013 20:07:12 +0200
Message-id: <[🔎] 20130828180712.GA8137@angband.pl>
In-reply-to: <[🔎] 21022.5425.511942.342379@chiark.greenend.org.uk>
References: <[🔎] 20130812005152.GA28636@angband.pl> <[🔎] 21022.5425.511942.342379@chiark.greenend.org.uk>

On Wed, Aug 28, 2013 at 04:20:17PM +0100, Ian Jackson wrote:
> Adam Borowski writes ("UTF-8 in jessie"):
> > I would like to propose full UTF-8 support.  I don't mean here full
> > support for all of Unicode's finer points, merely complete eradication of
> > mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> > as equal to "a""combining ¨" is out of scope of this proposal.
> 
> I agree with everything you propose except that I have one reservation
> regarding this:
> 
> > 4. all text files should be encoded in UTF-8
> 
> I agree with this except that I think it should be permitted that a
> text file uses ASCII codepoints.
> 
> You may say "but UTF-8 is a superset of ASCII".  Well, no, it isn't.

Uhm, how?

> UTF-8 is a superset of ISO-646 but ISO-646 is not identical to ASCII.
> In particular the descriptions of the codepoints ` ' in ISO-646
> effectively forbids them from being used as matching single quotes,
> despite that being specified as allowed in ASCII.

Let's take a look at some sheets.

Feb 1972:
https://en.wikipedia.org/wiki/File:ASCII_Code_Chart-Quick_ref_card.jpg

1967/68:
http://www.samhallas.co.uk/repository/telegraph/teletype_33_specs.pdf

` and ' don't look like anything resembling matching quotes to me.
Usually ' is vertical or slightly slanted, ` tends to be at 45 degrees
or quite close to horizontal.

> I don't think that better UTF-8 support should involve needlessly
> converting 7-bit ASCII text files which use ` ' as matched quotes,
> into UTF-8 text files which use non-ISO-646 codepoints.

These code points are defined to be exactly the same in both ASCII and
Unicode.  Only fonts may differ.  And like Han unification issues, this
is out of scope here.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ

Reply to:

Follow-Ups:
- Re: UTF-8 in jessie
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

References:
- UTF-8 in jessie
  - From: Adam Borowski <kilobyte@angband.pl>
- Re: UTF-8 in jessie
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

Prev by Date: Importing autopkgtests from Ubuntu [Was: Bits from the Release]
Next by Date: Re: Dreamhost dumps Debian
Previous by thread: Re: UTF-8 in jessie
Next by thread: Re: UTF-8 in jessie
Index(es):
- Date
- Thread