Re: UTF-8 in jessie

To: Adam Borowski <kilobyte@angband.pl>
Cc: debian-devel@lists.debian.org
Subject: Re: UTF-8 in jessie
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
Date: Wed, 28 Aug 2013 16:20:17 +0100
Message-id: <[🔎] 21022.5425.511942.342379@chiark.greenend.org.uk>
In-reply-to: <[🔎] 20130812005152.GA28636@angband.pl>
References: <[🔎] 20130812005152.GA28636@angband.pl>

Adam Borowski writes ("UTF-8 in jessie"):
> I would like to propose full UTF-8 support.  I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.

I agree with everything you propose except that I have one reservation
regarding this:

> 4. all text files should be encoded in UTF-8

I agree with this except that I think it should be permitted that a
text file uses ASCII codepoints.

You may say "but UTF-8 is a superset of ASCII".  Well, no, it isn't.
UTF-8 is a superset of ISO-646 but ISO-646 is not identical to ASCII.
In particular the descriptions of the codepoints ` ' in ISO-646
effectively forbids them from being used as matching single quotes,
despite that being specified as allowed in ASCII.

I don't think that better UTF-8 support should involve needlessly
converting 7-bit ASCII text files which use ` ' as matched quotes,
into UTF-8 text files which use non-ISO-646 codepoints.

(In fact I would like to see Markus Kuhn's decision about ` ' reversed
- our default character set should be ASCII for 0..127 plus UTF for
the rest.  That's not an argument I expect to win but at the very
least we shouldn't have to worsify things for ASCII users.)

Thanks,
Ian.

Reply to:

Follow-Ups:
- Re: UTF-8 in jessie
  - From: Adam Borowski <kilobyte@angband.pl>

References:
- UTF-8 in jessie
  - From: Adam Borowski <kilobyte@angband.pl>

Prev by Date: Re: Longer maintainance for (former) stable releases of Debian (Re: Dreamhost dumps Debian)
Next by Date: Update policies for security bugs [Was, Re: Dreamhost dumps Debian]
Previous by thread: Re: UTF-8 in jessie
Next by thread: Re: UTF-8 in jessie
Index(es):
- Date
- Thread