Re: Seeing clarification for locale names
* Marc Haber:
> I would appreciate pointers to documentation, personal opinions, war
> stories, encoding tales, historic lectures, anything that might
> enlighten me and help me build the knowlegde and understanding about
> UNIX locales are supposed to work in Debian GNU/Linux. Thank you in
> advance!
For the charset normalization, it's in the manual:
The only new thing is the @code{normalized codeset} entry. This is
another goodie which is introduced to help reduce the chaos which
derives from the inability of people to standardize the names of
character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
codeset} value is generated from the user-provided character set name by
applying the following rules:
@enumerate
@item
Remove all characters besides numbers and letters.
@item
Fold letters to lowercase.
@item
If the same only contains digits prepend the string @code{"iso"}.
@end enumerate
@noindent
So all of the above names will be normalized to @code{iso88591}. This
allows the program user much more freedom in choosing the locale name.
This code dates back to the mid-90s, I think.
I general, I think it is best to treat locale names as opaque strings.
Parsing them to derive charsets is not going to work (e.g., no charset
can mean ISO-8859-1 or UTF-8, depending on the age of the locale). To
get the charset of the current locale, you can use “locale -k charmap”,
for example. It corresponds to the glibc charmap name (of which there
aren't too many).
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Reply to: