Re: Seeing clarification for locale names

To: Marc Haber <mh+debian-glibc@zugschlus.de>
Cc: debian-glibc@lists.debian.org
Subject: Re: Seeing clarification for locale names
From: Florian Weimer <fweimer@redhat.com>
Date: Mon, 15 Feb 2021 17:20:30 +0100
Message-id: <[🔎] 87blcl8fqp.fsf@oldenburg.str.redhat.com>
In-reply-to: <[🔎] YCkwx0yaDLyqN+8O@torres.zugschlus.de> (Marc Haber's message of "Sun, 14 Feb 2021 15:16:39 +0100")
References: <[🔎] YCkwx0yaDLyqN+8O@torres.zugschlus.de>

* Marc Haber:

> I would appreciate pointers to documentation, personal opinions, war
> stories, encoding tales, historic lectures, anything that might
> enlighten me and help me build the knowlegde and understanding about
> UNIX locales are supposed to work in Debian GNU/Linux. Thank you in
> advance!

For the charset normalization, it's in the manual:

The only new thing is the @code{normalized codeset} entry.  This is
another goodie which is introduced to help reduce the chaos which
derives from the inability of people to standardize the names of
character sets.  Instead of @w{ISO-8859-1} one can often see @w{8859-1},
@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}.  The @code{normalized
codeset} value is generated from the user-provided character set name by
applying the following rules:

@enumerate
@item
Remove all characters besides numbers and letters.
@item
Fold letters to lowercase.
@item
If the same only contains digits prepend the string @code{"iso"}.
@end enumerate

@noindent
So all of the above names will be normalized to @code{iso88591}.  This
allows the program user much more freedom in choosing the locale name.


This code dates back to the mid-90s, I think.

I general, I think it is best to treat locale names as opaque strings.
Parsing them to derive charsets is not going to work (e.g., no charset
can mean ISO-8859-1 or UTF-8, depending on the age of the locale).  To
get the charset of the current locale, you can use “locale -k charmap”,
for example.  It corresponds to the glibc charmap name (of which there
aren't too many).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

Reply to:

References:
- Seeing clarification for locale names
  - From: Marc Haber <mh+debian-glibc@zugschlus.de>

Prev by Date: Seeing clarification for locale names
Next by Date: Bug#981650: Acknowledgement (libc6-dev: Update flag bit defines in fcntl.h on hppa)
Previous by thread: Seeing clarification for locale names
Next by thread: L'amélioration de l'impact de votre patrimoine bâti
Index(es):
- Date
- Thread