[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LANGUAGE vs LANG



Miroslav Kure wrote:

[ are you on debian-i18n, or would you like to be Cc'ed? ]

> After the fresh install of Sarge in Czech language you end
> up with system where (almost) everything is localised to
> Czech and you can read/write iso-8859-2 characters.
> 
> % cat /etc/environment
> LANGUAGE="cs_CZ:cs:en_GB:en"
> LANG=cs_CZ
> 
> Now the problem is when you switch the LANG to english, e.g.
> export LANG=en_GB

I _think_ the problem is that GNU Gettext (and other
libraries using the two environment variables) assume that
"LANG" has the same value as the first field in "LANGUAGE".  
This is a somewhat unsafe assumption, so it is possible that
"LANGUAGE" internally should be converted to
${LANG}:${LANGUAGE} before further processing.

> (of course en_GB being generated with dpkg-reconfigure
> locales), it still speaks in Czech, but iso-8859-2
> specific characters are messed, because en_GB of course
> uses iso-8859-1 charset.
> 
> You have to either unset LANGUAGE or remember to set it to
> the same value as LANG (e.g. export LANGUAGE="en_GB").
> 
> The question is, what is this variable for?

The "LANGUAGE" environment variable is used to set a
prioritised list of languages, so GNU Gettext doesn't have
to fall directly back to the POSIX locale, if there are
translations available in a (for the user) more appropriate
language.

I don't think "LANGUAGE" should influence character
encoding, date formats, etc., but one can of course imagine
some cases where a fall-back language is unrepresentable in
character encoding chosen with "LANG", but that is the
choice of the user.

> In woody and before, where I had to setup the environment
> manually, I didn't use LANGUAGE variable and everything
> was working ok. man locale doesn't talk about this
> variable either.

"LANGUAGE" is an extension to "LANG".  Since "LANG" is a
part of the POSIX standard (IIRC), it would be inappropriate
to just append the fall-back languages to "LANG", and that
is most likely why we now have both "LANG" and "LANGUAGE".

If "LANGUAGE" isn't set, all applications should work in the
old fashioned POSIX way with the POSIX/C as the only
fall-back "language".


Summary:

It would probably be a good idea to send a bug report 
up-stream to the GNU Gettext maintainers, where you
suggest that they:

 1) make sure that the language set with "LANG" always is
    prioritised over the ones set with "LANGUAGE"

 2) make sure to stick to the encoding set with "LANG", even
    when they fall back to a different language in the user
    interface translations

(but please first check that my analysis is correct)


Greetings,

Jacob
-- 
"Any, sufficiently complicated, experiment is indistinguishable from magic."



Reply to: