[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#502356: marked as done (locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly)



Your message dated Thu, 16 Oct 2008 08:49:28 +0200
with message-id <48F6E3F8.4060101@aurel32.net>
and subject line Re: Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
has caused the Debian Bug report #502356,
regarding locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
502356: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502356
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: locales
Version: 2.3.6.ds1-13etch7
Severity: normal
Tags: l10n


I was doing a bit of C++ programming, and replacing my own swedish collation algorithm with
the standard locales (through the standard C++ std::locale interface), when my unit tests
started to fail. It turned out I could repeat it with the standard sort utility, so that's
what I'll use here.

This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends to
implement, and it's also the rule I am familiar with from real life:

% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'.  Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.

And that seems to work *some* of the time ... out of the following three examples,
the two first are ok and show how it should work. The third is simply wrong --
"wword" and "vword" are identical except one contains the 'w' variant of the
letter 'v', and should thus collate last.

tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
word
vorm
tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
vord
word
tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
wword
vword
tuva:~> 

I have not done any further experiments to see what triggers it. I cannot help
suspecting that similar rules for other languages are affected as well ...

Final side note: Solaris 8 passes this test. That's the only other Unix I've tested.

regards,
Jorgen

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: powerpc (ppc)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-powerpc
Locale: LANG=sv_SE.utf8, LC_CTYPE=sv_SE.utf8 (charmap=UTF-8)

Versions of packages locales depends on:
ii  debconf [debconf-2.0]  1.5.11etch2       Debian configuration management sy
ii  libc6 [glibc-2.3.6.ds1 2.3.6.ds1-13etch7 GNU C Library: Shared libraries

locales recommends no packages.

-- debconf information:
  locales/default_environment_locale: en_US
  locales/locales_to_be_generated: en_US ISO-8859-1, sv_SE.UTF-8 UTF-8, sv_SE ISO-8859-1



--- End Message ---
--- Begin Message ---
Version: 2.7-1

Jorgen Grahn a écrit :
> Package: locales
> Version: 2.3.6.ds1-13etch7
> Severity: normal
> Tags: l10n
> 
> 
> I was doing a bit of C++ programming, and replacing my own swedish collation algorithm with
> the standard locales (through the standard C++ std::locale interface), when my unit tests
> started to fail. It turned out I could repeat it with the standard sort utility, so that's
> what I'll use here.
> 
> This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends to
> implement, and it's also the rule I am familiar with from real life:
> 
> % The letter w is normally not present in the Swedish alphabet. It
> % exists in some names in Swedish and foreign words, but is accounted
> % for as a variant of 'v'.  Words and names with 'w' are in Swedish
> % ordered alphabetically among the words and names with 'v'. If two
> % words or names are only to be distinguished by 'v' or % 'w', 'v' is
> % placed before 'w'.
> 
> And that seems to work *some* of the time ... out of the following three examples,
> the two first are ok and show how it should work. The third is simply wrong --
> "wword" and "vword" are identical except one contains the 'w' variant of the
> letter 'v', and should thus collate last.
> 
> tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
> word
> vorm
> tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
> vord
> word
> tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
> wword
> vword
> tuva:~> 
> 
> I have not done any further experiments to see what triggers it. I cannot help
> suspecting that similar rules for other languages are affected as well ...
> 

The bug is fixed in glibc 2.7 and following. Closing the bug for those
versions.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net


--- End Message ---

Reply to: