[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly



Package: locales
Version: 2.3.6.ds1-13etch7
Severity: normal
Tags: l10n


I was doing a bit of C++ programming, and replacing my own swedish collation algorithm with
the standard locales (through the standard C++ std::locale interface), when my unit tests
started to fail. It turned out I could repeat it with the standard sort utility, so that's
what I'll use here.

This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends to
implement, and it's also the rule I am familiar with from real life:

% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'.  Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.

And that seems to work *some* of the time ... out of the following three examples,
the two first are ok and show how it should work. The third is simply wrong --
"wword" and "vword" are identical except one contains the 'w' variant of the
letter 'v', and should thus collate last.

tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
word
vorm
tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
vord
word
tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
wword
vword
tuva:~> 

I have not done any further experiments to see what triggers it. I cannot help
suspecting that similar rules for other languages are affected as well ...

Final side note: Solaris 8 passes this test. That's the only other Unix I've tested.

regards,
Jorgen

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: powerpc (ppc)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-powerpc
Locale: LANG=sv_SE.utf8, LC_CTYPE=sv_SE.utf8 (charmap=UTF-8)

Versions of packages locales depends on:
ii  debconf [debconf-2.0]  1.5.11etch2       Debian configuration management sy
ii  libc6 [glibc-2.3.6.ds1 2.3.6.ds1-13etch7 GNU C Library: Shared libraries

locales recommends no packages.

-- debconf information:
  locales/default_environment_locale: en_US
  locales/locales_to_be_generated: en_US ISO-8859-1, sv_SE.UTF-8 UTF-8, sv_SE ISO-8859-1



Reply to: