[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie


Quoting Adam Borowski (2013-08-12 02:51:52)
> On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote:
> > now might be the right time to start a discussion about release goals
> > for jessie.
> I would like to propose full UTF-8 support.  I don't mean here full
> support for all of Unicode's finer points, merely complete eradication of
> mojibake.  That is, ensuring that /m.o/ matches "möo", or that "ä" sorts
> as equal to "a""combining ¨" is out of scope of this proposal.
> I propose the following sub-goals:
> 1. all programs should, in their default configuration, accept UTF-8 input
>    and pass it through uncorrupted.  Having to manually specify encoding
>    is acceptable only in a programmatic interface, GUI/std{in,out,err}/
>    command line/plain files should work with nothing but LC_CTYPE.

as an addendum to this release goal proposal, it is maybe also worth mentioning
working multibyte character support in coreutils as a possible goal.

From http://bugs.debian.org/139861 :

$ echo -e "日\n本\nで\nは" | sort -u | wc -l
$ echo -e "日\n本\nで\nは" | sort | wc -l

Or having head/tail which work character base instead of byte based would be
sweet as well.

While upstream doesnt seem to support this, it seems that Fedora has a patch
for coreutils:


or also:


cheers, josch

Reply to: