Bug#284137: locale -a reports misleading values for UTF-8 locales
At Sat, 4 Dec 2004 18:13:24 +0100,
Guillermo S. Romero <gsromero@alumnos.euitt.upm.es> wrote:
> Denis said I have to use UTF-8 so Xlib works, ignoring what locale -a
> report, and you say utf8 is the right value, but that will make Xlib
> not work. Denis reasoning:
>
> The name listed in /usr/share/i18n/SUPPORTED is en_GB.UTF-8 and it
> is displayed when running 'dpkg-reconfigure locales', so you should
> use this one.
>
> And yours:
>
> Glibc uses "canonical locale name" with lower-case, hyphen-removed
> ... So using canonical name is the right way.
>
> So what I am asking is if Xlib should use utf8 following your
> reasoning or locale -a should report UTF-8 so it's output is useful
> and users do not set env vars wrongly. You both are explaining the
> reasons for each problem, closing or wanting to close the bugs, but
> providing different solutions.
X11 uses the user representation name (UTF-8), because user should use
such name. OTOH, locale -a displays the glibc internal normalized
representation name (utf8), because glibc should accept not only
UTF-8, but also other possible user representation name like utf8.
One example is ja_JP.eucJP vs ja_JP.ujis vs ja_JP.EUC-JP. eucJP is
defined as the user representation name required by the Japanese
locale standard (2000-). ujis is defined as the user representation
name required by the old Japanese locale standard (-1999). EUC-JP is
defined as the user representation name required by the LSB and glibc.
However, glibc should handle them at the same time as the same locale
name. So, glibc internal uses normalized name "eucjp".
This "user representation name" vs "normalized name" are different. I
think you're discussing this difference.
> Or in other words, which is the unified value for all apps in a Debian
> system, that all apps should support (technical issue) and also report
> (cosmetic issue), and thus which of the two bugs it the right one,
> maybe expanding to other parts of the Debian system, so they also
> match.
Hmm, I think it's difficult problem - and I think the user
representation name should be free from such limitation.
I think the problem is the system does not provide the function to
answer "en_GB.UTF-8 and en_GB.utf8 is the same locale". One way to
check that two locales are same or not:
> cat localeident.sh
#!/bin/sh
#
# localeident.sh - inspect two locales are the exact identical locale
# (C) 2004 GOTO Masanori <gotom@debian.org>
# GPL
if [ -z "$1" ] || [ -z "$2" ]; then
echo "Usage: $0 localename1 localename2"
exit 2
fi
func () {
file=$(mktemp)
export LC_ALL=$LOC
LOCRET=$(locale -c category 2>$file)
export LC_ALL=C
if [ -s $file ]; then
echo error while opening locale $LOC
exit 2
fi
LOCRET=$(echo $LOCRET | sed 's/LC_IDENTIFICATION //g')
if [ -z LOCRET ]; then
echo cannot get locale information $LOC
exit 2
fi
}
LOC=$1
func
L1=$LOCRET
LOC=$2
func
L2=$LOCRET
if [ "$L1" = "$L2" ]; then
echo locale "$1" and "$2" are the same locale information
exit 0
else
echo locale "$1" and "$2" are the different locale information
exit 1
fi
> ./localeident.sh ja_JP.eucJP ja_JP.EUC-JP
locale ja_JP.eucJP and ja_JP.EUC-JP are the same locale information
> echo $?
0
> ./localeident.sh ja_JP.eucJP ja_JP.eucjp
locale ja_JP.eucJP and ja_JP.eucjp are the same locale information
> echo $?
0
> ./localeident.sh ja_JP.eucJP ja_JP.UTF-8
locale ja_JP.eucJP and ja_JP.UTF-8 are the different locale information
> echo $?
1
(If you think it's worthwhile that glibc includes this shell script,
please let me know)
Regards,
-- gotom
Reply to: