Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences

To: Clint Adams <schizo@debian.org>
Cc: Andras Korn <korn-debbugs@elan.rulez.org>, 570929@bugs.debian.org, Pirity Tamas Gabor <ptg@apaczai.elte.hu>
Subject: Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
From: Jonathan Nieder <jrnieder@gmail.com>
Date: Thu, 25 Feb 2010 01:50:02 -0600
Message-id: <20100225075002.GA17848@progeny.tock>
Reply-to: Jonathan Nieder <jrnieder@gmail.com>, 570929@bugs.debian.org
In-reply-to: <20100224152303.GA3664@scru.org>
References: <20100222100721.GA3620@hellgate.intra.guy> <20100223231133.GA21466@hellgate.intra.guy> <20100224042925.GA18149@progeny.tock> <20100224091309.GB21466@hellgate.intra.guy> <20100224152303.GA3664@scru.org>

Clint Adams wrote:
> On Wed, Feb 24, 2010 at 10:13:09AM +0100, Andras Korn wrote:
>> No, because apparently [.zs.] isn't a valid collating element:
>
> Should it be?

Yes, I think so: it comes after z in alphabetical order.  See
http://lists.mysql.com/mysql/204718 for example.

glibc thinks so too, AFAICT.  From localedata/locales/hu_HU:

 collating-symbol  <zs>
 collating-element <Z-S> from "<U005A><U0053>"
 collating-element <Z-s> from "<U005A><U0073>"
 collating-element <z-S> from "<U007A><U0053>"
 collating-element <z-s> from "<U007A><U0073>"
 collating-element <Z-Z-S> from "<U005A><U005A><U0053>"
 collating-element <Z-Z-s> from "<U005A><U005A><U0073>"
 collating-element <Z-z-S> from "<U005A><U007A><U0053>"
 collating-element <Z-z-s> from "<U005A><U007A><U0073>"
 collating-element <z-Z-S> from "<U007A><U005A><U0053>"
 collating-element <z-Z-s> from "<U007A><U005A><U0073>"
 collating-element <z-z-S> from "<U007A><U007A><U0053>"
 collating-element <z-z-s> from "<U007A><U007A><U0073>"

Anyway, I decided to try a collating element from another language.
ch is a single letter for collation in Welsh.

 $ echo 'ch and more' | LANG=cy_GB.UTF-8 sed 's/./<MATCHED>/'
 <MATCHED>h and more
 $ echo 'ch and more' | LANG=cy_GB.UTF-8 sed 's/[^a]/<MATCHED>/'
 <MATCHED> and more
 $ echo 'ch and more' | LANG=cy_GB.UTF-8 sed 's/[[.ch.]]/<MATCHED>/'
 sed: -e expression #1, char 21: Invalid collation character

Odd, no?

Reply to:

Follow-Ups:
- Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
  - From: Jonathan Nieder <jrnieder@gmail.com>

References:
- Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
  - From: Andras Korn <korn-debbugs@elan.rulez.org>
- Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
  - From: Andras Korn <korn-debbugs@elan.rulez.org>
- Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
  - From: Jonathan Nieder <jrnieder@gmail.com>
- Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
  - From: Andras Korn <korn-debbugs@elan.rulez.org>
- Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
  - From: Clint Adams <schizo@debian.org>

Prev by Date: Bug#570233: [patches] please add timepps.h
Next by Date: Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
Previous by thread: Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
Next by thread: Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
Index(es):
- Date
- Thread