--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: src:glibc: default locale to C.UTF-8
- From: Adam Borowski <kilobyte@angband.pl>
- Date: Sun, 03 Sep 2017 18:49:54 +0200
- Message-id: <150445739412.19645.16155557646492828951.reportbug@umbar.angband.pl>
Package: src:glibc
Version: 2.24-17
Severity: wishlist
Tags: patch
Hi!
Here's a simple patch set to change the default of setlocale(…, "") to
C.UTF-8. This is a drastically smaller change than altering the meaning of
"C" to mean "C.UTF-8" that upstream is mulling over -- it affects only
programs that already have locale support, when the user fails to set any.
If none of LC_ALL, LANG nor LC_CTYPE are set, instead of taking this to mean
"C" we assume "C.UTF-8". This is explicitely allowed by POSIX (an
"implementation-defined default locale"). setlocale(…, "C") or not calling
it at all retain the old meaning[1].
This is the approach already taken by musl.
I'm not submitting this upstream first as C.UTF-8 is still a Debian-specific
thing.
The improvement would be: if for any reason the user fails to set the
locale, a daemon's startup script is too eager clearing its environment,
a build chroot fails to inherit env vars, etc -- in all of these cases we'll
fall back to an UTF-8 locale. Making a locale-aware program use "C" is
still fully possible via setting LC_ALL=C but we won't suffer from non-UTF8
by omission.
This is mostly an one-line patch (1/3), the other two update the testsuite
(2/3) and alter hard-coded output of /usr/bin/locale (3/3).
Meow!
[1]. Making "C" behave like "C.UTF-8" would be, according to my reading,
compliant with both POSIX-2008@2016 and C11 except for a minor iswblank()
weirdness, but this is not a part of this change.
-- System Information:
Debian Release: buster/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'testing'), (150, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.13.0-rc7-debug-ubsan-00220-g92222baeac7d (SMP w/6 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)
>From 92d9938c6ba813afaf854d7bc12a9dc0c71371c3 Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilobyte@angband.pl>
Date: Sun, 3 Sep 2017 00:26:47 +0200
Subject: [PATCH 1/3] Default to C.UTF-8 on setlocale(..., "") if no env vars
are set.
This doesn't affects programs that are not prepared to handle arbitrary
locales as those either don't call setlocale() at all or use setlocale(...,
"C"); merely programs which would have used a proper locale had the user
set it up.
This provides a decent default when env var configuration is missing, in a
way that's more robust than mucking with login defs and daemon startup
scripts.
A default locale other than "C" is allowed by POSIX; also at least musl
uses an equivalent of C.UTF-8 already.
---
locale/findlocale.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/locale/findlocale.c b/locale/findlocale.c
index 4cb9d5ea8a..2a12b4e808 100644
--- a/locale/findlocale.c
+++ b/locale/findlocale.c
@@ -123,8 +123,12 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len,
+ _nl_category_name_idxs[category]);
if (!name_present (cloc_name))
cloc_name = getenv ("LANG");
+ /* If no env vars are set, we're free to choose an
+ "implementation-defined default locale":
+ http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
+ */
if (!name_present (cloc_name))
- cloc_name = _nl_C_name;
+ cloc_name = "C.UTF-8";
}
/* We used to fall back to the C locale if the name contains a slash
--
2.14.1
>From 612dc7f67f93882b7acb2f035b1cc200ceb2e153 Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilobyte@angband.pl>
Date: Sun, 3 Sep 2017 03:43:10 +0200
Subject: [PATCH 2/3] Adjust the setlocale test suite for C.UTF-8 as default.
---
localedata/bug-setlocale1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/localedata/bug-setlocale1.c b/localedata/bug-setlocale1.c
index 546ea7beb8..2c86e2361d 100644
--- a/localedata/bug-setlocale1.c
+++ b/localedata/bug-setlocale1.c
@@ -39,9 +39,9 @@ do_test (void)
if (d == NULL)
return 1;
- if (strcmp (d, "C") != 0)
+ if (strcmp (d, "C.UTF-8") != 0)
{
- puts ("*** LC_NUMERIC not C");
+ puts ("*** LC_NUMERIC not C.UTF-8");
result = 1;
}
--
2.14.1
>From fb6cc4a418c6278dfc2dcf45bc1ea40e06ef9caf Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilobyte@angband.pl>
Date: Sun, 3 Sep 2017 13:43:41 +0200
Subject: [PATCH 3/3] Change hard-coded value for "no defined vars" in
/usr/bin/locale.
---
locale/programs/locale.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/locale/programs/locale.c b/locale/programs/locale.c
index 9da3e5319f..131472766c 100644
--- a/locale/programs/locale.c
+++ b/locale/programs/locale.c
@@ -819,7 +819,7 @@ show_locale_vars (void)
print_assignment (name,
lcall[0] != '\0' ? lcall
: lang[0] != '\0' ? lang
- : "POSIX",
+ : "C.UTF-8",
true);
else
print_assignment (name, val, false);
--
2.14.1
--- End Message ---