[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: non-ASCII characters in /etc/locales.alias ?



Hi,

At 19 Jan 2002 03:25:58 +0100,
Tollef Fog Heen wrote:

> | I don't understand what you mean by this.  You mean, what is "wrong"?
> 
> LANG can be unset or set to POSIX, still you would be able to input
> for instance Norwegian characters without problem.

Well, xterm and many other softwares look like being able to use
"ISO-8859-1" even in C/POSIX locale.  This is because these softwares
don't support locale at all.  These softwares were developed by
people who don't know about i18n at all.  In these softwares, all
characters are treated as mere byte sequences, not characters with
encoding specification.  If you can input, it is a relic from old
non-i18n era and should be regarded as bugs in modern i18n computing.
(However, many of such relics live today because most developers and
users don't understand about i18n, LC_CTYPE, encoding, and coded
character set.  Thus, we cannot simply "fix" these bugs because
such i18n-novice people would complain).


> | And, the output of "locale" command means what in this context?
> | (Of course I understand what "locale" command outputs.)
> 
> It shows that I am able to input Norwegian (and French) characters
> without configuring a locale.

What you are watching depends on your font.  Your softwares don't
recognize your characters are ISO-8859-1.  Just 8bit byte sequence
as indeces for your favorite font.  An old way before i18n era.


> | >  What you need to do is configure your keymap properly.
> | 
> | This is wrong, because keymap is not enough for Japanese input.
> | Well, you cannot configure keymap to input Japanese.
> 
> You can for most other languages.

Please don't distinguish languages.  Am I a pity exception?


> | BTW, the contents of your mail was illegal encoding ... It
> | contained my ISO-2022-JP-encoded Japanese and your 8bit
> | characters (0xe6, 0xf8, 0xe5, 0xe7), though the mail header
> | insists the contents is ISO-8859-1.  Of course, ISO-2022-JP-
> | encoded JIS X 0208 characters in ISO-8859-1 encoding is
> | illegal.
> 
> No, they are not illegal, they just don't represent what you thought
> they would.  That is,  is a perfectly legal character which can be
> represented using ISO-8859-1.  The other characters were ASCII.

No.  Usage of such ISO-2022 escape sequence is illegal in ISO-8859-1
encoding.  (Please note that I am saying about specific escape sequence
you used, not about usage of 0x1b.)  Please study about encodings and
coded character sets.


> | (I imagine your 0xe6 0xf8 0xe5 0xe7 sequence in your mail is
> | intended to be ISO-8859-1, I imagined from your mail header.
> 
> Since my header shows that the body of the mail was in latin1 and I
> input those character, that is a reasonable assumption.

Since you used ISO-2022 sequences in ISO-8859-1 encoding, it is
natural I imagine you don't care about encoding.


Anyway, usage of ISO-8859-1 characters for /etc/locale.alias is illegal.
Only people who think ISO-8859-1 is the only encodings in the world or
people who don't care about foreign people think such illegal thing,
though it might have been legal in old non-i18n era.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: