[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie



Hi,

Quoting Adam Borowski (2013-08-12 02:51:52)
> On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote:
> > now might be the right time to start a discussion about release goals
> > for jessie.
> 
> I would like to propose full UTF-8 support.  I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.
> 
> I propose the following sub-goals:
> 
> 1. all programs should, in their default configuration, accept UTF-8 input
>    and pass it through uncorrupted.  Having to manually specify encoding
>    is acceptable only in a programmatic interface, GUI/std{in,out,err}/
>    command line/plain files should work with nothing but LC_CTYPE.

as an addendum to this release goal proposal, it is maybe also worth mentioning
working multibyte character support in coreutils as a possible goal.

From http://bugs.debian.org/139861 :

$ echo -e "日\n本\nで\nは" | sort -u | wc -l
3
$ echo -e "日\n本\nで\nは" | sort | wc -l
4

Or having head/tail which work character base instead of byte based would be
sweet as well.

While upstream doesnt seem to support this, it seems that Fedora has a patch
for coreutils:

http://pkgs.fedoraproject.org/cgit/coreutils.git/tree/coreutils-i18n.patch?id=6e10f376996b64f538259091a524df2249b653fb;id2=HEAD

or also:

http://trac.cross-lfs.org/browser/patches/coreutils-6.12-unicode-1.patch?rev=577dd2d59133e10bd32c58844293e93af0e6f162

cheers, josch


Reply to: