Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
Package: locales
Version: 2.3.6.ds1-13etch7
Severity: normal
Tags: l10n
I was doing a bit of C++ programming, and replacing my own swedish collation algorithm with
the standard locales (through the standard C++ std::locale interface), when my unit tests
started to fail. It turned out I could repeat it with the standard sort utility, so that's
what I'll use here.
This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends to
implement, and it's also the rule I am familiar with from real life:
% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'. Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.
And that seems to work *some* of the time ... out of the following three examples,
the two first are ok and show how it should work. The third is simply wrong --
"wword" and "vword" are identical except one contains the 'w' variant of the
letter 'v', and should thus collate last.
tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
word
vorm
tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
vord
word
tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
wword
vword
tuva:~>
I have not done any further experiments to see what triggers it. I cannot help
suspecting that similar rules for other languages are affected as well ...
Final side note: Solaris 8 passes this test. That's the only other Unix I've tested.
regards,
Jorgen
-- System Information:
Debian Release: 4.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: powerpc (ppc)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-powerpc
Locale: LANG=sv_SE.utf8, LC_CTYPE=sv_SE.utf8 (charmap=UTF-8)
Versions of packages locales depends on:
ii debconf [debconf-2.0] 1.5.11etch2 Debian configuration management sy
ii libc6 [glibc-2.3.6.ds1 2.3.6.ds1-13etch7 GNU C Library: Shared libraries
locales recommends no packages.
-- debconf information:
locales/default_environment_locale: en_US
locales/locales_to_be_generated: en_US ISO-8859-1, sv_SE.UTF-8 UTF-8, sv_SE ISO-8859-1
Reply to: