Re: UTF-8 in jessie
Hi,
Quoting Adam Borowski (2013-08-12 02:51:52)
> On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote:
> > now might be the right time to start a discussion about release goals
> > for jessie.
>
> I would like to propose full UTF-8 support. I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake. That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.
>
> I propose the following sub-goals:
>
> 1. all programs should, in their default configuration, accept UTF-8 input
> and pass it through uncorrupted. Having to manually specify encoding
> is acceptable only in a programmatic interface, GUI/std{in,out,err}/
> command line/plain files should work with nothing but LC_CTYPE.
as an addendum to this release goal proposal, it is maybe also worth mentioning
working multibyte character support in coreutils as a possible goal.
From http://bugs.debian.org/139861 :
$ echo -e "日\n本\nで\nは" | sort -u | wc -l
3
$ echo -e "日\n本\nで\nは" | sort | wc -l
4
Or having head/tail which work character base instead of byte based would be
sweet as well.
While upstream doesnt seem to support this, it seems that Fedora has a patch
for coreutils:
http://pkgs.fedoraproject.org/cgit/coreutils.git/tree/coreutils-i18n.patch?id=6e10f376996b64f538259091a524df2249b653fb;id2=HEAD
or also:
http://trac.cross-lfs.org/browser/patches/coreutils-6.12-unicode-1.patch?rev=577dd2d59133e10bd32c58844293e93af0e6f162
cheers, josch
Reply to: