Bug#265163: locales: locale.alias aliases some names to unsupported locales
Package: locales
Version: 2.3.2.ds1-15
Severity: normal
Tags: upstream
Some of the locale aliases in /etc/locale.alias map names to unsupported
locales.  Namely, "eucJP" and "eucKR" aren't spelled correctly per
/usr/share/i18n/SUPPORTED, and the "SJIS" codeset isn't supported at all.
I'm attaching two files:
* A Python script I wrote that found this problem.
* A patch to correct the problem.  I corrected all but one problem; I had
  to drop the alias for "japanese.sjis", as adding support for the SJIS
  character set to glibc is beyond my ability, and I don't even know if
  that's a desirable solution.
Thanks for looking into this.
-- System Information:
Debian Release: 3.1
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: powerpc (ppc)
Kernel: Linux 2.4.25-powerpc-smp
Locale: LANG=C, LC_CTYPE=en_US.UTF-8
Versions of packages locales depends on:
ii  debconf                     1.4.30       Debian configuration management sy
ii  libc6 [glibc-2.3.2.ds1-15]  2.3.2.ds1-15 GNU C Library: Shared libraries an
-- debconf information:
* locales/default_environment_locale: None
* locales/locales_to_be_generated: en_US ISO-8859-1, en_US.ISO-8859-15 ISO-8859-15, en_US.UTF-8 UTF-8
#!/usr/bin/python
import os
import re
RUNTIME_DEBUG = True
# Build a dictionary of canonical locales according to the GNU C library.  The
# keys in this dictionary are the locale names, and the values are the character
# sets used by each locale name.
glibc_locales_canonical = { }
glibc_locale_file = open(os.path.join("/", "usr", "share", "i18n", "SUPPORTED"))
for line in glibc_locale_file.readlines():
    (left_side, right_side) = re.split(r'\s', line, 1)
    glibc_locales_canonical[(left_side.strip())] = right_side.strip()
glibc_locale_file.close()
if RUNTIME_DEBUG:
    print "Canonical glibc locales: %s" % (glibc_locales_canonical.keys(),)
glibc_locales_aliased = { }
glibc_alias_file = open(os.path.join("/", "etc", "locale.alias"))
for line in glibc_alias_file.readlines():
    # Ignore blank lines and lines beginning with a comment character.
    # beginning with "XCOMM".
    if re.match(r'$', line) \
      or re.match(r'#', line):
        continue
    (left_side, right_side) = re.split(r'\s', line, 1)
    glibc_locales_aliased[(left_side.strip())] = right_side.strip()
    # glibc is a little weird; it aliases names to locale specifications
    # *including* the codeset, whereas it omits the codeset from the officially
    # supported list except when necessary for disambiguation purposes.
    # Consequently, if we don't find the alias's target in the canonical list,
    # we have to fall back to seeing if it is in the canonical list using the
    # same codeset that is explicitly stated.
    if right_side.strip() not in glibc_locales_canonical.keys():
        # Try harder to find it.
        goal_locale = right_side.strip()
        found = False
        for locale in glibc_locales_canonical.keys():
            if not re.match(r'\.', locale):
                locale_with_codeset = '.'.join([ locale,
                                               glibc_locales_canonical[locale] ])
                if goal_locale == locale_with_codeset:
                    found = True
                    break
        if not found:
            print "Warning: glibc bug: glibc locale %s is aliased to" \
              " non-canonical glibc locale %s" \
              % (left_side.strip(), right_side.strip())
glibc_alias_file.close()
if RUNTIME_DEBUG:
    print "Aliased glibc locales: %s" % (glibc_locales_aliased.keys(),)
# vim:set ai et sts=4 sw=4 tw=80:
--- /etc/locale.alias.dpkg-dist	2004-08-11 19:15:44.000000000 -0500
+++ /etc/locale.alias	2004-08-11 19:17:57.000000000 -0500
@@ -49,14 +49,13 @@
 hungarian       hu_HU.ISO-8859-2
 icelandic       is_IS.ISO-8859-1
 italian         it_IT.ISO-8859-1
-japanese	ja_JP.eucJP
-japanese.euc	ja_JP.eucJP
-ja_JP		ja_JP.eucJP
-ja_JP.ujis	ja_JP.eucJP
-japanese.sjis	ja_JP.SJIS
-korean		ko_KR.eucKR
-korean.euc 	ko_KR.eucKR
-ko_KR		ko_KR.eucKR
+japanese	ja_JP.EUC-JP
+japanese.euc	ja_JP.EUC-JP
+ja_JP		ja_JP.EUC-JP
+ja_JP.ujis	ja_JP.EUC-JP
+korean		ko_KR.EUC-KR
+korean.euc 	ko_KR.EUC-KR
+ko_KR		ko_KR.EUC-KR
 lithuanian      lt_LT.ISO-8859-13
 norwegian       no_NO.ISO-8859-1
 nynorsk		nn_NO.ISO-8859-1
Reply to: