Re: non-ASCII characters in /etc/locales.alias ?
On Thu, 2002-01-17 at 14:25, Tomohiro KUBOTA wrote:
> Hi,
> 
> At 17 Jan 2002 10:21:20 +0000,
> Alastair McKinstry wrote:
> 
> > (1) in /etc/locale.alias, we start with the line
> > 
> > # -*- coding: iso-8859-1 -* 
> > # Locale database
> > 
> > to signal that the file is encoded in ISO-8859-1. If the encoding is
> > changed, this line should be changed.
> > This should be documented in the locale.alias manpage (currently in
> > BTS).
> 
> I propose this to included into /usr/share/doc/locales/examples/
> directory.  The default /etc/locale.alias should be ASCII only,
> though I agree some people will need compatibility to non-
> internationalized (i.e., ISO-8859-1 in /etc/locale.alias)
> systems.  I think such people will need to read some documents
> (manpages OK, README.Debian OK, any others OK) and set up their
> /etc/locale.alias .
>
? Can you explain a bit more?
The point of doing it this way is to not break existing compatability;
the -*- hint will be ignored where not understood (as I understand it 
it would fix the 8-bit editing problem in JP: emacs would just open
the file in ISO-8859 mode.
I don't want to break existing functionality (which removing
bokmal/francais would do); we recommend to users that they use ASCII
only.
> 
> > (2) We recommend (again in the manpage) that locales be ASCII-only,
> > because of the can't-easily-enter-alias-while-in-conflicting-locale
> > problem. 
> 
> I see.   (Not "can't easily", but "theoretically impossible".)
> 
you could do 
setenv LC_ALL=$'\x66\x72\x61\x6e\xe7\x61\x69\x73'
to set to francais (latin1  encoding) while in UTF-8 mode.
(years doing physics and comp. security/sysadmin have taught me to be
very careful when using words like impossible..)
> 
> > However because of backward compatability we will support the existing
> > aliases : people with LC_ALL=bokmål, for example, will want their
> > systems to continue working. We can't easily upgrade out of this
> > problem; users telneting or sshing from other Linux boxes (or HPUX,
> > where these locale aliases started) will not want their displays
> > broken).
> 
> I propose that ISO-8859-1 version of /etc/locale.alias to be
> prepared in /usr/share/doc/locales/examples/ directory.  The
> manpage can have an instruction how to use the file.  The 
> default /etc/locale.alias should not contain ISO-8859-1 locale
> names.
> 
> The reason is that, if ISO-8859-1 locale names can be used in
> default settings, new users (who don't have to take care of
> compatibility to old systems) may want to use the locale names.
> This should be avoided because (1) it is simply a wrong thing,
> (2) they will depend on a system which cannot co-exist with
> international users and they will come to feel i18n as something
> annoying, and (3) they will feel one more difficulty to migrate
> into UTF-8.  Usage of ISO-8859-1 locale names should be limited
> to people who _really_ need the compatibility to old systems
> and who read instructions and notices and know what they are
> doing.
> 
> 
> > (3) 'locale' gets changed to support the coding tag. This fixes the bug
> > where
> > $ unicode_start ; export LC_ALL=en_US.utf8
> > $ locale -a
> > lists 'bokml' not  'bokm?l', for example.
> 
> I think this is not needed because "fran?ais" and "bokm?l" are
> exception and illegal makeshift for compatibility to old non-
> internationalized systems.
> 
> This opinion is not so strong.  If someone will develop this,
> I won't stop him/her.  However, please note this work is more
> than many people imagine.  For example, what would be the
> "legal" encoding names?  GNU libc names, GNU Emacs names, and
> MIME names are different.  Not only names but the real encodings
> are different.  Now Li18nux people are trying to construct a
> standard names for encodings.  It is not too late to wait the
> standard will be released.
> 
Thanks for the pointer to Li18nux; I should check that out. I would hope
people would just use the POSIX (libc) names, rather than invent new
schemes. Really, users should just be able to select their locale once,
from a GUI, etc (a friendly method) and then have the locale name set;
e.g. you can put the localised language names in a list, the user clicks
one, and use no aliases.
> I just don't think such works are not worth doing for compatibility
> of two dirty locale names of "bokm?l" and "fran?ais".
> 
> If this "improvement" of "locale" would enable us to use _any_
> multiple encodings we like, it would be nice and might be worth
> doing.  However, the "improvement" will enable us to use only
> _one_ of any encodings.
> 
Well, changing the first line to "coding: utf-8" would allow any
characters, but not multiple encodings.
The proposed improvement is a minor one, agreed. I don't propose that we
should use multiple encodings in locale alias names; that way madness
lies.
BTW, in the bug shown above,
LC_ALL=en_US.utf8 locale -a
how should we show "fran?ais", or should we show it at all? if we
transliterate it to utf-8, it will preserve correctly, but not 
(locale code expecting the latin1 variant). I propose we don't show it.
- Alastair
> ---
> Tomohiro KUBOTA <kubota@debian.org>
> http://www.debian.or.jp/~kubota/
> "Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
> 
-- 
Alastair McKinstry, 
Reply to: