[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: non-ASCII characters in /etc/locales.alias ?



On Thu, Jan 17, 2002 at 12:39:00AM +0900, Tomohiro KUBOTA wrote:
> > I've been looking at /etc/locales.alias and the possibility of
> > auto-generating it from locale-gen; and noticed that it has non-ASCII
> > characters in it: in particular in 
> > 
> > 	bokm?l		no_NO.ISO-8859-1
> > 	fran?ais	fr_FR.ISO-8859-1
> > 
> > I think using non-ASCII characters in /etc/locales.alias is dodgy; it
> > would break in non- ISO-8859-1 environments. Should this be supported?
> > Should /etc/locales.alias have a tag describing its encoding?
> > (e.g. an emacs-type tag) 
> 
> I agree your opinion.  Since definition of non-ASCII characters are done
> by locale, non-ASCII characters cannot be used before the user specifies
> the locale.  Before the user specifies the locale, >0x80 characters
> are "undefined characters".

> ISO-8859-1 is a local encoding, just like EUC-JP is local encoding for
> Japanese.  Especially, it cannot co-exist with multibyte encodings.

That applies to all system textfiles (/etc, /usr/include).

If wanting to have native-language tags for existing locales is wanted here,
then making an exception for locale.alias is arguable, but it should
probably be UTF-8, not ISO-8859-1.  (Either way, programs using it will
need to know to convert it to the current locale's charset.)

日本語		ja_JP.ISO-2022-JP

> (If you edit /etc/locale.alias with multibyte-capable editor in
> multibyte locales, the 8bit "undefined" characters will be probably
> broken.  I feel this difficulty of editing when I translated Debian
> webpage templates with "slices".  To avoid destroying the Debian web
> files, I have to use non-locale-supporting and 8bit editors.  However,
> to edit Japanese, I have to use 8bit-clean and multibyte-clean editor.)

Why do you need an 8-bit-clean editor to edit Japanese?  If you're
editing textfiles, in most locales, you're almost never going to be
8-bit-clean (ie. I wouldn't expect an editor in UTF-8 to maintain
invalid UTF-8 sequences.)

-- 
Glenn Maynard



Reply to: