Bug#502356: marked as done (locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly)

To: Aurelien Jarno <aurelien@aurel32.net>
Subject: Bug#502356: marked as done (locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Thu, 16 Oct 2008 06:51:09 +0000
Message-id: <[🔎] handler.502356.D502356.122413977513552.ackdone@bugs.debian.org>
References: <48F6E3F8.4060101@aurel32.net> <[🔎] 20081015205849.9048.54861.reportbug@tuva.opensoft.se>

Your message dated Thu, 16 Oct 2008 08:49:28 +0200
with message-id <48F6E3F8.4060101@aurel32.net>
and subject line Re: Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
has caused the Debian Bug report #502356,
regarding locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
502356: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502356
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
From: Jorgen Grahn <grahn+debian@snipabacken.se>
Date: Wed, 15 Oct 2008 22:58:49 +0200
Message-id: <[🔎] 20081015205849.9048.54861.reportbug@tuva.opensoft.se>

Package: locales
Version: 2.3.6.ds1-13etch7
Severity: normal
Tags: l10n

I was doing a bit of C++ programming, and replacing my own swedish collation algorithm with
the standard locales (through the standard C++ std::locale interface), when my unit tests
started to fail. It turned out I could repeat it with the standard sort utility, so that's
what I'll use here.

This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends to
implement, and it's also the rule I am familiar with from real life:

% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'. Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.

And that seems to work *some* of the time ... out of the following three examples,
the two first are ok and show how it should work. The third is simply wrong --
"wword" and "vword" are identical except one contains the 'w' variant of the
letter 'v', and should thus collate last.

tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
word
vorm
tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
vord
word
tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
wword
vword
tuva:~>

I have not done any further experiments to see what triggers it. I cannot help
suspecting that similar rules for other languages are affected as well ...

Final side note: Solaris 8 passes this test. That's the only other Unix I've tested.

regards,
Jorgen

-- System Information:
Debian Release: 4.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: powerpc (ppc)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-powerpc
Locale: LANG=sv_SE.utf8, LC_CTYPE=sv_SE.utf8 (charmap=UTF-8)

Versions of packages locales depends on:
ii debconf [debconf-2.0] 1.5.11etch2 Debian configuration management sy
ii libc6 [glibc-2.3.6.ds1 2.3.6.ds1-13etch7 GNU C Library: Shared libraries

locales recommends no packages.

-- debconf information:
locales/default_environment_locale: en_US
locales/locales_to_be_generated: en_US ISO-8859-1, sv_SE.UTF-8 UTF-8, sv_SE ISO-8859-1

--- End Message ---

--- Begin Message ---

To: 502356-done@bugs.debian.org
Subject: Re: Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
From: Aurelien Jarno <aurelien@aurel32.net>
Date: Thu, 16 Oct 2008 08:49:28 +0200
Message-id: <48F6E3F8.4060101@aurel32.net>
In-reply-to: <[🔎] 20081015205849.9048.54861.reportbug@tuva.opensoft.se>
References: <[🔎] 20081015205849.9048.54861.reportbug@tuva.opensoft.se>

Version: 2.7-1

Jorgen Grahn a écrit :
> Package: locales
> Version: 2.3.6.ds1-13etch7
> Severity: normal
> Tags: l10n
> 
> 
> I was doing a bit of C++ programming, and replacing my own swedish collation algorithm with
> the standard locales (through the standard C++ std::locale interface), when my unit tests
> started to fail. It turned out I could repeat it with the standard sort utility, so that's
> what I'll use here.
> 
> This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends to
> implement, and it's also the rule I am familiar with from real life:
> 
> % The letter w is normally not present in the Swedish alphabet. It
> % exists in some names in Swedish and foreign words, but is accounted
> % for as a variant of 'v'.  Words and names with 'w' are in Swedish
> % ordered alphabetically among the words and names with 'v'. If two
> % words or names are only to be distinguished by 'v' or % 'w', 'v' is
> % placed before 'w'.
> 
> And that seems to work *some* of the time ... out of the following three examples,
> the two first are ok and show how it should work. The third is simply wrong --
> "wword" and "vword" are identical except one contains the 'w' variant of the
> letter 'v', and should thus collate last.
> 
> tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
> word
> vorm
> tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
> vord
> word
> tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
> wword
> vword
> tuva:~> 
> 
> I have not done any further experiments to see what triggers it. I cannot help
> suspecting that similar rules for other languages are affected as well ...
> 

The bug is fixed in glibc 2.7 and following. Closing the bug for those
versions.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

--- End Message ---

Reply to:

References:
- Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
  - From: Jorgen Grahn <grahn+debian@snipabacken.se>

Prev by Date: Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
Next by Date: Bug#502430: Argentina finally made up it's mind regarding DST
Previous by thread: Bug#502356: locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
Next by thread: Bug#502430: Argentina finally made up it's mind regarding DST
Index(es):
- Date
- Thread