[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1026231: debian-policy: document droppage of support for legacy locales



Package: debian-policy
Version: 4.6.1.1
Severity: wishlist

Hi!
As of Bookworm, legacy locales are no longer officially supported.  In order
to not break testsuites, they're mostly working if you install locales-all,
and you may manually request their generation by editing /etc/locale.gen --
but functionality is expected to bit rot and/or be removed in the future.

Thus, what about spelling this in the Policy?:

* Software may assume they always run in an UTF-8 locale, and emit or
  require UTF-8 input/output without checking.
* The execution environment (usually init system or a container) must
  default to UTF-8 encoding unless explicitly configured otherwise.
* Legacy locales are no longer officially supported, and packages may
  drop support for them and/or exclude them from their testsuites.
* Packages may retain support for legacy locales, but related bug reports
  (unless security related) are considered to be of wishlist severity.
* Filesystems may be configured to reject file names that are not valid
  printable UTF-8 encoded Unicode.
* So-called BOM (U+FEFF) must not be added to plain-text output, and if
  present, editors/viewers customarily used for editing code should not
  hide its presence.
* Human-readable files outside of packages' private data must be encoded
  in UTF-8.  This applies especially to files in /usr/share/doc and /etc
  but applies to eg. executable scripts in /bin or /sbin as well.

Rationale: it takes non-trivial amount of code to support diverse encodings;
Unicode is a strict superset of all legacy charsets thus there's no loss of
functionality by switching to it exclusively.  In todays Unicode world, text
files of other encodings present a barrier to being read by the user.

While data received from outside the network may legitimately use legacy
locales, requiring all of stdin/stdout/stderr and filesystem data to use
UTF-8 would simplify code.  It's not like we pay more than lip service to
other encodings anymore...

While diversity in software is welcome, diversity in standards is not:
UTF-8 will not damage your pinky finger nor require Alt-F2 kill -9 to
exit; will not make your computer fail to boot or require a trip to the
data center; nor infect your K desktop with gnomeitis.  [Of course, there's
no plausible reason to use Postfix, ever!].  In other words, having multiple
phone vendors is essential but having multiple charging connectors is bad.

As for BOM, it is explicitly discouraged by the Unicode Consortium, and can
cause security vulnerabilities where scripts that pass human review act
different than it appears.  <FEFF>#!/bin/perl gets executed by bash, and
this is just one of examples.

As for inits/containers declaring LC_CTYPE=C.UTF-8, systemd has been doing
this for a while, in sysvinit land we debated whether that's still needed
when glibc started to consider unset locale to mean C.UTF-8 rather than C
-- but then, some language compilers do not use glibc.  debootstrap doesn't
configure a default locale, while not all higher-level tools do so,
rendering a system installed in non-standard but reasonable way to lack
the setting, to the surprise of the admin.


Meow!


Reply to: