[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: lintian groff-message warning "can't set the locale"



On Mon, 26 Oct 2020 at 00:35:45 -0400, Nick Black wrote:
> Thanks for the quick response, Felix. You say that "[you] will
> probably start setting $LANG in that part of Lintian." what LANG
> will you be using? Attempting to set LANG=en_US.UTF-8 in my
> salsa ci variables resulted in setlocale(3) failing all over the
> place, presumably due to the locale not having been generated.

C.UTF-8 is available on all Debian systems. It's the standard C/POSIX
locale, except that in the C locale the meaning of bytes 0x80-0xFF is
undefined, while in C.UTF-8 they are assumed/defined to be part of a
character encoded in UTF-8.

If you care about portability to non-Debian systems, note that C.UTF-8 is
a somewhat popular extension (I think it originated in the Fedora/Red Hat
family before it was adopted by Debian and other distros) but is far from
universally available. In particular, I'm aware of Arch Linux specifically
*not* having it. The glibc maintainers consider the implementation used
in e.g. Fedora and Debian to be a hack rather than something they want to
maintain forever, but my understanding is that they would be willing to
accept a better implementation.

en_US.UTF-8 is indeed not portable. Some OSs (Fedora, I think?) always
generate the en_US.UTF-8 locale regardless of any other configuration
that might exist, but Debian does not: if you chose a non-English locale
like fr_FR.UTF-8 or a non-American English locale like en_GB.UTF-8 during
installation, then you will normally only have three locales, your chosen
national locale plus the international locales C and C.UTF-8.

Minimal container/chroot environments, and in particular the official
Debian buildds, will normally only have C and C.UTF-8. See src:gtk+4.0
for an example of how to generate additional locales on-demand if your
unit tests need them.

Third-party software from outside Debian frequently assumes that the
en_US.UTF-8 locale does exist - in particular, it's common enough for
Steam games to want it to exist that Steam's diagnostic tool now checks
for it. This is mostly because it's semi-frequently (ab)used as a way
to parse and serialize C-syntax floating point in programming languages
or configuration files without getting confused by non-English decimal
points (e.g. 1.23 in English locales is 1,23 in French locales, which
means a naive implementation might write {"x": 1,23, "y": 4,56} into a
JSON file, which is of course a syntax error).

The portable way to read/write configuration files and C-like source
code is to avoid the POSIX locale-sensitive functions completely,
and use something like GLib's g_ascii_strtod() or CPython's
PyOS_string_to_double() (lots of libraries and frameworks will have an
equivalent, those are just the ones I'm most familiar with). This also
has the advantage of being thread-safe, unlike temporarily switching
POSIX locales, which is normally process-wide and therefore not thread-safe.

Another correct way to do this since POSIX.1-2008 is to use POSIX
uselocale() and the C locale, but that's unlikely to be portable
to Windows or to exotic Unix implementations, so widely-portable
software generally ends up having to reinvent something equivalent to
g_ascii_strtod() anyway.

    smcv


Reply to: