[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences



Clint Adams wrote:
> On Wed, Feb 24, 2010 at 10:13:09AM +0100, Andras Korn wrote:
>> No, because apparently [.zs.] isn't a valid collating element:
>
> Should it be?

Yes, I think so: it comes after z in alphabetical order.  See
http://lists.mysql.com/mysql/204718 for example.

glibc thinks so too, AFAICT.  From localedata/locales/hu_HU:

 collating-symbol  <zs>
 collating-element <Z-S> from "<U005A><U0053>"
 collating-element <Z-s> from "<U005A><U0073>"
 collating-element <z-S> from "<U007A><U0053>"
 collating-element <z-s> from "<U007A><U0073>"
 collating-element <Z-Z-S> from "<U005A><U005A><U0053>"
 collating-element <Z-Z-s> from "<U005A><U005A><U0073>"
 collating-element <Z-z-S> from "<U005A><U007A><U0053>"
 collating-element <Z-z-s> from "<U005A><U007A><U0073>"
 collating-element <z-Z-S> from "<U007A><U005A><U0053>"
 collating-element <z-Z-s> from "<U007A><U005A><U0073>"
 collating-element <z-z-S> from "<U007A><U007A><U0053>"
 collating-element <z-z-s> from "<U007A><U007A><U0073>"

Anyway, I decided to try a collating element from another language.
ch is a single letter for collation in Welsh.

 $ echo 'ch and more' | LANG=cy_GB.UTF-8 sed 's/./<MATCHED>/'
 <MATCHED>h and more
 $ echo 'ch and more' | LANG=cy_GB.UTF-8 sed 's/[^a]/<MATCHED>/'
 <MATCHED> and more
 $ echo 'ch and more' | LANG=cy_GB.UTF-8 sed 's/[[.ch.]]/<MATCHED>/'
 sed: -e expression #1, char 21: Invalid collation character

Odd, no?



Reply to: