[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

locale name standard



[please cc me, as indicated in headers]

Many Debian users living outside of the US use locales, i.e. they set
the environment variable LANG, or LC_DATE, LC_NUMERIC, etc. This post
is about the value one should set these variables to.

While the locale interface itself is well standardized, there seems to
be no resemblence of a standard about these values. SUSv2 calls them
"implementation dependent". The current LSB draft does not mention
locales. The glibc manual avoids naming names. The only concrete
documentation I could find was the setlocale(3) manpage (see below).

What's the problem?

Well, if you give a value not understood by the program you use, it is
generally ignored. There are at least two prominent libraries (glibc
and libX11) interpreting the value, and their set of acceptable values
is not equal. Since there is no standard, every implementation can
claim to be right. Since there is almost no documentation, users have
to try and err until they are right.

libc example (only works when the locale is present, see
/etc/locale.gen):

$ for LANG in de de_DE de_DE.foo de_DE.ASCII de_de.iso-8859-1 de_DE.iso-8859-1 de_DE.ISO-8859-1 de_DE.ISO8859-1 de_DE.ISO-88591 de_DE.ISO88591; do printf "%-20s " $LANG; date +"%A"; done 
de                   Tuesday
de_DE                Dienstag
de_DE.foo            Tuesday
de_DE.ASCII          Tuesday
de_de.iso-8859-1     Tuesday
de_DE.iso-8859-1     Dienstag
de_DE.ISO-8859-1     Dienstag
de_DE.ISO8859-1      Dienstag
de_DE.ISO-88591      Tuesday
de_DE.ISO88591       Dienstag

Wow, so for strftime to work correctly, language and territory have to
be there (case is significant); charset must be absent or known, where
case is irrelevant this time, some dashes may be omitted, but not
every time. Eeek.

It gets hairier with libX11. Let's use a gtk program because they
issue nice warnings when X11 did not recognize the locale.
>From the above variants, only
de_DE
de_DE.ISO8859-1
succeed without warning, meaning they are recognized by glibc /and/
xlib. The others give the warning, and problems with latin1 characters
(see bug#100970).

OTOH, glibc seems to suggest that "de_DE.ISO-8859-1" is the standard
value, see /etc/locale.alias. "locale -a" reports deutsch, german,
de_DE, de_DE@euro. None of these are standard according to xlib.

Rather than populating /usr/X11R6/lib/X11/locale/locale.alias with a
gazillion alternatives as suggested in bugs 84735, 86903, and 99350,
why not standardize on one format and *document* that?

I.e. (adapted from setlocale(3)):

  A locale name is either a convenience alias (see below), or of the
  form ll_TT[.codeset][@modifier], where ll is a two-letter ISO 639
  language code in lower case, TT is a two-letter ISO 3166 country
  code in upper case.

  The optional codeset fragment is a character set or encoding
  identifier. Currently defined values are ISO8859-1, ISO8859-2 [more
  here]. Variant spellings like iso-8859-1, while accepted by some
  applications for compatibility, are deprecated. The default codeset
  is [what?]

  The optional modifier can select variants of the locale. The only
  currently defined value is euro (to select the Euro as currency).

  Convenience aliases are intended to be understandable to users of
  the locale without need for further documentation. They must not
  have an underscore as their third character. Systems should
  provide the description of each locale in english and the locale's
  language as an alias for this locale.

-- 
Robbe

Attachment: signature.ng
Description: PGP signature


Reply to: