[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#463018: problem with cyrillic letter IO and letter case issue in utf8_general_ci



Package: general
Severity: normal

I have problem in mysql (5.0.32-Debian_7etch5-log Debian etch distribution)
Better to view it on a wide screen.

It's all about cyrillic characters:
Ёё comparing to Ее and
Йй comparing to Ии

characters (first code is cp1251):
е 0xE5 = U+0435 : CYRILLIC SMALL LETTER IE
Е 0xC5 = U+0415 : CYRILLIC CAPITAL LETTER IE
ё 0xB8 = U+0451 : CYRILLIC SMALL LETTER IO
Ё 0xA8 = U+0401 : CYRILLIC CAPITAL LETTER IO
и 0xE8 = U+0438 : CYRILLIC SMALL LETTER I
И 0xC8 = U+0418 : CYRILLIC CAPITAL LETTER I
й 0xE9 = U+0439 : CYRILLIC SMALL LETTER SHORT I
Й 0xC9 = U+0419 : CYRILLIC CAPITAL LETTER SHORT I 

from http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

I compare several collations in cp1251 and utf8 character sets.
SQL SELECT looks like (cp1251 hex codes):
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
        B8    A8   E9    C9   7A    5A   E5    B8   E8    E9   C8    C9   E5     B8   C5    A8   E8    E9   A8    B8   C9    E9   C1    E1   4C    6C

Compare case sensitive collations
---------------------------------

mysql> set names cp1251 collate cp1251_general_cs;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         0 |         0 |         0 |         0 |         0 |         0 |          0 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Correct

mysql> set names utf8 collate utf8_bin;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         0 |         0 |         0 |         0 |         0 |         0 |          1 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Wrong!

The 1 (true value) in second result is a error! It has to be 0 (false) (sorting issue)


Compare case insensitive collations
-----------------------------------

mysql> set names cp1251 collate cp1251_general_ci;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         1 |         1 |         1 |         0 |         0 |         0 |          0 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Correct

mysql> set names utf8 collate utf8_general_ci;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         0 |         0 |         1 |         0 |         0 |         0 |          1 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Wrong in columns (starting from 1): 1, 2, 7. First 3 columns have to be 1 and the rest - 0

mysql> set names utf8 collate utf8_unicode_ci;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         1 |         1 |         1 |         1 |         1 |         1 |          1 |         1 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Wrong in columns (starting from 1): 4, 5, 6, 7, 8. First 3 columns have to be 1 and the rest - 0

How to repeat:
set names cp1251 collate cp1251_general_cs;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >='ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names utf8 collate utf8_bin;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >='ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names cp1251 collate cp1251_general_ci;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >='ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names utf8 collate utf8_general_ci;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >='ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names utf8 collate utf8_unicode_ci;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >='ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-686
Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=CP1251)



Reply to: