Re: UTF-8 in jessie

To: Adam Borowski <kilobyte@angband.pl>, debian-devel@lists.debian.org
Subject: Re: UTF-8 in jessie
From: Johannes Schauer <j.schauer@email.de>
Date: Mon, 14 Oct 2013 12:50:58 +0200
Message-id: <[🔎] 20131014105058.7934.26083@hoothoot>
Mail-followup-to: Adam Borowski <kilobyte@angband.pl>, debian-devel@lists.debian.org,
In-reply-to: <20130812005152.GA28636@angband.pl>
References: <20130812005152.GA28636@angband.pl>

Hi,

Quoting Adam Borowski (2013-08-12 02:51:52)
> On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote:
> > now might be the right time to start a discussion about release goals
> > for jessie.
> 
> I would like to propose full UTF-8 support.  I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.
> 
> I propose the following sub-goals:
> 
> 1. all programs should, in their default configuration, accept UTF-8 input
>    and pass it through uncorrupted.  Having to manually specify encoding
>    is acceptable only in a programmatic interface, GUI/std{in,out,err}/
>    command line/plain files should work with nothing but LC_CTYPE.

as an addendum to this release goal proposal, it is maybe also worth mentioning
working multibyte character support in coreutils as a possible goal.

From http://bugs.debian.org/139861 :

$ echo -e "日\n本\nで\nは" | sort -u | wc -l
3
$ echo -e "日\n本\nで\nは" | sort | wc -l
4

Or having head/tail which work character base instead of byte based would be
sweet as well.

While upstream doesnt seem to support this, it seems that Fedora has a patch
for coreutils:

http://pkgs.fedoraproject.org/cgit/coreutils.git/tree/coreutils-i18n.patch?id=6e10f376996b64f538259091a524df2249b653fb;id2=HEAD

or also:

http://trac.cross-lfs.org/browser/patches/coreutils-6.12-unicode-1.patch?rev=577dd2d59133e10bd32c58844293e93af0e6f162

cheers, josch

Reply to:

Prev by Date: Re: Bits from the Release Team (Jessie freeze info)
Next by Date: Re: Bits from the Release Team (Jessie freeze info)
Previous by thread: Bug#726288: ITP: libdata-messagepack-perl -- MessagePack serializing/deserializing
Next by thread: Bug#726316: ITP: libdatetime-format-rfc3339-perl -- module to parse and format RFC3339 datetime strings
Index(es):
- Date
- Thread