[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: non-ASCII characters in /etc/locales.alias ?

On Wed, 2002-01-23 at 05:52, Tomohiro KUBOTA wrote:
> Hi,
> At 16 Jan 2002 13:48:46 +0000,
> Alastair McKinstry wrote:
> > I've been looking at /etc/locales.alias and the possibility of
> > auto-generating it from locale-gen; and noticed that it has non-ASCII
> > characters in it: in particular in 
> > 
> > 	bokmål		no_NO.ISO-8859-1
> > 	français	fr_FR.ISO-8859-1
> > 
> > I think using non-ASCII characters in /etc/locales.alias is dodgy; it
> > would break in non- ISO-8859-1 environments. Should this be supported?
> > Should /etc/locales.alias have a tag describing its encoding?
> > (e.g. an emacs-type tag) 
> > Does anyone use these aliases?
> > 
> > /uhsr/X11R6/lib/X11/locales/locales.alias says these are defined on
> > HPUX; are they in real use?
> > 
> > Opinions?
> How do you think about the discussion from the following mail?
> http://mail.gnu.org/pipermail/bug-glibc/2002-January/004481.html
> ---
> Tomohiro KUBOTA <kubota@debian.org>
> http://www.debian.or.jp/~kubota/
> "Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
Thanks for the pointer; I hadn't had a chance to visit bug-glibc and see
the discussion. 

First, the basics: I think the user should be selecting the locale that
they are primarily using via a GUI, or script like tzselect, as pointed
out by Ulrich; in the debian case a debconf-type interface would be most
consistent. This would then set their LANG, LC_ALL variables to the
canonical form, described in the POSIX specs and Li18nux locale draft
(in the case of the system, rather than a particular user, the locale
variable gets set in a shell config file /etc/sysconfig/i18n. Maybe we
should do something similar in debian (I am not yet familiar enough with
Debian to know the 'right place').

The user should not be messing about with locale aliases. In this sense,
I agree with Ulrich; it should just "go away". It is there for
compatability with previous and other locale names; people who have 
"LANG=french", for example, in their config scripts. In particular, it
is useful for people who have /home mounted across multiple machines
(and OS's; as I do at work, etc). It is necessary for people to log in
to Linux and "just work", and this is why I was against removing 
" bokmål	" and  " français " from the file.

Ulrich is also right that they are "byte sequences", and as their
presence in the file does not break a system, they are in one sense
"valid". The user never _needs_ to type them in from the locale; it
would be preferable if the user typed "LANG=fr_FR.ISO-8859-15@euro"

However the file is confusing, and misleads the user into thinking
"fran?ais" (if that is how it is shown in the "wrong" locale), is a
valid locale name, or "bokml"; they may type in those as locale names,
and things break. Also, if they do "locale -a" they will be shown these
"locales", which are invalid, and hence not really available.

This is why I brought the subject up in the context of locale-gen: I
believe the whole locales.alias file should be autogenerated by
locale-gen. Because not only is "fran?ais" an invalid locale, if I
haven't got the locale files on my system generated by locale-gen, then
"LANG=german" is also invalid, and should not be shown as a valid entry
in /etc/locales.alias or "locale -a". These should show "all available
locales", as described in the man page (on Linux, and similarly on
Solaris, etc. looking around here..)

I Think this should generate these entries. It should also include the
"français" and "bokmål" entries, as they may be needed (when French and
Norwegian locales are generated), but probably issue warnings that they
are for backward compatability and should not be used. The locale.alias
file would then have a header, saying it is autogenerated and users
should not edit it, but mail new entries if necessary to the appropriate

Hence the "-*- coding: ISO-8859-1 -*-" tag I suggested for locale.alias.
Yes, its emacs specific, but it gets the point across and at least makes
one editor work right. Its a workaround to a misfeature; the
latin1-encoded locale names are misfeatures, and cannot cleanly be
fixed, only worked around for compatability. We should design our future
systems (set-language-env, etc.) not to generate such cruft. I also
think that, if we are changing /etc/locales.alias, we should unify it
with /usr/X11/lib/X11/locale/locale.alias, and make sure that X11 works
primarily with the POSIX locale names, with "canonical" charset names 
(eg. ISO-8859-1, rather than ISO8859-1, etc.)

One place where "non-conforming" locale names are useful, BTW, is in
idioms such as
$ "LANG=french evolution"
etc. when we want to run a program in another locale, and conformant
locale names are a pain to type. However we can easily generate such
names from the locale files (type "locale language" to get the
english-language name of a locale) and add such locale aliases to

BTW, I like the set-language-env package that you created, and think it
should be expanded with a debconf-type interface, and if run as root,
generate the default system locale. Any thoughts?

Alastair McKinstry------------------ 
Alastair McKinstry, Silicon Design Group, 3Com.
Key fingerprint = 81ED 5787 E579 BA78 99FD  8243 CA18 BDC2 FCA7 8FFF

Reply to: