Re: non-ASCII characters in /etc/locales.alias ?
On Thu, 2002-01-17 at 14:25, Tomohiro KUBOTA wrote:
> Hi,
>
> At 17 Jan 2002 10:21:20 +0000,
> Alastair McKinstry wrote:
>
> > (1) in /etc/locale.alias, we start with the line
> >
> > # -*- coding: iso-8859-1 -*
> > # Locale database
> >
> > to signal that the file is encoded in ISO-8859-1. If the encoding is
> > changed, this line should be changed.
> > This should be documented in the locale.alias manpage (currently in
> > BTS).
>
> I propose this to included into /usr/share/doc/locales/examples/
> directory. The default /etc/locale.alias should be ASCII only,
> though I agree some people will need compatibility to non-
> internationalized (i.e., ISO-8859-1 in /etc/locale.alias)
> systems. I think such people will need to read some documents
> (manpages OK, README.Debian OK, any others OK) and set up their
> /etc/locale.alias .
>
? Can you explain a bit more?
The point of doing it this way is to not break existing compatability;
the -*- hint will be ignored where not understood (as I understand it
it would fix the 8-bit editing problem in JP: emacs would just open
the file in ISO-8859 mode.
I don't want to break existing functionality (which removing
bokmal/francais would do); we recommend to users that they use ASCII
only.
>
> > (2) We recommend (again in the manpage) that locales be ASCII-only,
> > because of the can't-easily-enter-alias-while-in-conflicting-locale
> > problem.
>
> I see. (Not "can't easily", but "theoretically impossible".)
>
you could do
setenv LC_ALL=$'\x66\x72\x61\x6e\xe7\x61\x69\x73'
to set to francais (latin1 encoding) while in UTF-8 mode.
(years doing physics and comp. security/sysadmin have taught me to be
very careful when using words like impossible..)
>
> > However because of backward compatability we will support the existing
> > aliases : people with LC_ALL=bokmål, for example, will want their
> > systems to continue working. We can't easily upgrade out of this
> > problem; users telneting or sshing from other Linux boxes (or HPUX,
> > where these locale aliases started) will not want their displays
> > broken).
>
> I propose that ISO-8859-1 version of /etc/locale.alias to be
> prepared in /usr/share/doc/locales/examples/ directory. The
> manpage can have an instruction how to use the file. The
> default /etc/locale.alias should not contain ISO-8859-1 locale
> names.
>
> The reason is that, if ISO-8859-1 locale names can be used in
> default settings, new users (who don't have to take care of
> compatibility to old systems) may want to use the locale names.
> This should be avoided because (1) it is simply a wrong thing,
> (2) they will depend on a system which cannot co-exist with
> international users and they will come to feel i18n as something
> annoying, and (3) they will feel one more difficulty to migrate
> into UTF-8. Usage of ISO-8859-1 locale names should be limited
> to people who _really_ need the compatibility to old systems
> and who read instructions and notices and know what they are
> doing.
>
>
> > (3) 'locale' gets changed to support the coding tag. This fixes the bug
> > where
> > $ unicode_start ; export LC_ALL=en_US.utf8
> > $ locale -a
> > lists 'bokml' not 'bokm?l', for example.
>
> I think this is not needed because "fran?ais" and "bokm?l" are
> exception and illegal makeshift for compatibility to old non-
> internationalized systems.
>
> This opinion is not so strong. If someone will develop this,
> I won't stop him/her. However, please note this work is more
> than many people imagine. For example, what would be the
> "legal" encoding names? GNU libc names, GNU Emacs names, and
> MIME names are different. Not only names but the real encodings
> are different. Now Li18nux people are trying to construct a
> standard names for encodings. It is not too late to wait the
> standard will be released.
>
Thanks for the pointer to Li18nux; I should check that out. I would hope
people would just use the POSIX (libc) names, rather than invent new
schemes. Really, users should just be able to select their locale once,
from a GUI, etc (a friendly method) and then have the locale name set;
e.g. you can put the localised language names in a list, the user clicks
one, and use no aliases.
> I just don't think such works are not worth doing for compatibility
> of two dirty locale names of "bokm?l" and "fran?ais".
>
> If this "improvement" of "locale" would enable us to use _any_
> multiple encodings we like, it would be nice and might be worth
> doing. However, the "improvement" will enable us to use only
> _one_ of any encodings.
>
Well, changing the first line to "coding: utf-8" would allow any
characters, but not multiple encodings.
The proposed improvement is a minor one, agreed. I don't propose that we
should use multiple encodings in locale alias names; that way madness
lies.
BTW, in the bug shown above,
LC_ALL=en_US.utf8 locale -a
how should we show "fran?ais", or should we show it at all? if we
transliterate it to utf-8, it will preserve correctly, but not
(locale code expecting the latin1 variant). I propose we don't show it.
- Alastair
> ---
> Tomohiro KUBOTA <kubota@debian.org>
> http://www.debian.or.jp/~kubota/
> "Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/
>
--
Alastair McKinstry,
Reply to: