Bug#284137: locale -a reports misleading values for UTF-8 locales

To: "Guillermo S. Romero" <gsromero@alumnos.euitt.upm.es>, 284137@bugs.debian.org
Cc: GOTO Masanori <gotom@debian.or.jp>, barbier@linuxfr.org
Subject: Bug#284137: locale -a reports misleading values for UTF-8 locales
From: GOTO Masanori <gotom@debian.or.jp>
Date: Mon, 06 Dec 2004 10:38:28 +0900
Message-id: <[🔎] 81eki4rysr.wl@omega.webmasters.gr.jp>
Reply-to: GOTO Masanori <gotom@debian.or.jp>, 284137@bugs.debian.org
In-reply-to: <[🔎] 20041204181324.A3154@blitzkrieg.battleship>
References: <[🔎] 20041203235521.A5821@blitzkrieg.battleship> <[🔎] 81mzwurvu1.wl@omega.webmasters.gr.jp> <[🔎] 20041204164455.A2929@blitzkrieg.battleship> <[🔎] 81hdn2rr2f.wl@omega.webmasters.gr.jp>

At Sat, 4 Dec 2004 18:13:24 +0100,
Guillermo S. Romero <gsromero@alumnos.euitt.upm.es> wrote:
> Denis said I have to use UTF-8 so Xlib works, ignoring what locale -a
> report, and you say utf8 is the right value, but that will make Xlib
> not work. Denis reasoning:
> 
>   The name listed in /usr/share/i18n/SUPPORTED is en_GB.UTF-8 and it
> is displayed when running 'dpkg-reconfigure locales', so you should
> use this one.
> 
> And yours:
> 
>   Glibc uses "canonical locale name" with lower-case, hyphen-removed
> ...  So using canonical name is the right way.
> 
> So what I am asking is if Xlib should use utf8 following your
> reasoning or locale -a should report UTF-8 so it's output is useful
> and users do not set env vars wrongly. You both are explaining the
> reasons for each problem, closing or wanting to close the bugs, but
> providing different solutions.

X11 uses the user representation name (UTF-8), because user should use
such name.  OTOH, locale -a displays the glibc internal normalized
representation name (utf8), because glibc should accept not only
UTF-8, but also other possible user representation name like utf8.

One example is ja_JP.eucJP vs ja_JP.ujis vs ja_JP.EUC-JP.  eucJP is
defined as the user representation name required by the Japanese
locale standard (2000-).  ujis is defined as the user representation
name required by the old Japanese locale standard (-1999).  EUC-JP is
defined as the user representation name required by the LSB and glibc.
However, glibc should handle them at the same time as the same locale
name.  So, glibc internal uses normalized name "eucjp".

This "user representation name" vs "normalized name" are different.  I
think you're discussing this difference.

> Or in other words, which is the unified value for all apps in a Debian
> system, that all apps should support (technical issue) and also report
> (cosmetic issue), and thus which of the two bugs it the right one,
> maybe expanding to other parts of the Debian system, so they also
> match.

Hmm, I think it's difficult problem - and I think the user
representation name should be free from such limitation.

I think the problem is the system does not provide the function to
answer "en_GB.UTF-8 and en_GB.utf8 is the same locale".  One way to
check that two locales are same or not:

	> cat localeident.sh 
	#!/bin/sh
	#
	# localeident.sh - inspect two locales are the exact identical locale
	# (C) 2004 GOTO Masanori <gotom@debian.org>
	# GPL
	
	if [ -z "$1" ] || [ -z "$2" ]; then
	        echo "Usage: $0 localename1 localename2"
	        exit 2
	fi
	
	func () {
	        file=$(mktemp)
	        export LC_ALL=$LOC
	        LOCRET=$(locale -c category 2>$file)
	        export LC_ALL=C
	        if [ -s $file ]; then
	                echo error while opening locale $LOC
	                exit 2
	        fi
	        LOCRET=$(echo $LOCRET | sed 's/LC_IDENTIFICATION //g')
	        if [ -z LOCRET ]; then
	                echo cannot get locale information $LOC
	                exit 2
	        fi
	}
	
	LOC=$1
	func
	L1=$LOCRET
	
	LOC=$2
	func
	L2=$LOCRET
	
	if [ "$L1" = "$L2" ]; then
	        echo locale "$1" and "$2" are the same locale information
	        exit 0
	else
	        echo locale "$1" and "$2" are the different locale information
	        exit 1
	fi

	> ./localeident.sh ja_JP.eucJP ja_JP.EUC-JP
	locale ja_JP.eucJP and ja_JP.EUC-JP are the same locale information
	> echo $?
	0
	> ./localeident.sh ja_JP.eucJP ja_JP.eucjp
	locale ja_JP.eucJP and ja_JP.eucjp are the same locale information
	> echo $?
	0
	> ./localeident.sh ja_JP.eucJP ja_JP.UTF-8
	locale ja_JP.eucJP and ja_JP.UTF-8 are the different locale information
	> echo $?
	1

(If you think it's worthwhile that glibc includes this shell script,
please let me know)

Regards,
-- gotom

Reply to:

Follow-Ups:
- Bug#284137: locale -a reports misleading values for UTF-8 locales
  - From: "Guillermo S. Romero" <gsromero@alumnos.euitt.upm.es>

References:
- Bug#284137: locale -a reports misleading values for UTF-8 locales
  - From: "Guillermo S. Romero" <gsromero@alumnos.euitt.upm.es>
- Bug#284137: locale -a reports misleading values for UTF-8 locales
  - From: GOTO Masanori <gotom@debian.or.jp>
- Bug#284137: locale -a reports misleading values for UTF-8 locales
  - From: "Guillermo S. Romero" <gsromero@alumnos.euitt.upm.es>
- Bug#284137: locale -a reports misleading values for UTF-8 locales
  - From: GOTO Masanori <gotom@debian.or.jp>
- Bug#284137: locale -a reports misleading values for UTF-8 locales
  - From: "Guillermo S. Romero" <gsromero@alumnos.euitt.upm.es>

Prev by Date: Bug#259302: Patch update against base-files 3.1
Next by Date: Re: mmap broken - glibc or kernel to blame?
Previous by thread: Bug#284137: locale -a reports misleading values for UTF-8 locales
Next by thread: Bug#284137: locale -a reports misleading values for UTF-8 locales
Index(es):
- Date
- Thread