For the mksh regression tests, I need a UTF-8 locale working; most
systems either provide “en_US.UTF-8” or “en_US.utf8” with the former
being recommended.
Build-depending on locales-all has worked for me so far, except it
won’t do in Kubuntu where said package does not exist (workaround
is to run 「locale-gen en_US.UTF-8」 in a pbuilder hook, but that’s
almost certainly not allowed in debian/rules *and* requires root),
and fails on hurd-i386 recently (locales-all fails to install).
The promise of the etch release to bring UTF-8 support was not met
because a standard installation of etch does not supply any locale
which can be used for LC_CTYPE with UTF-8 support; only installing
locales-all, or installing locales and debconfing one will do so.
I do not know about lenny, though, I have to admit.
The most light-weight solution would be to
• introduce a “C.UTF-8” locale, as some other OSes did, which is
equivalent to the “C” (POSIX) locale in all respects *except*
for LC_CTYPE, where it uses UTF-8 instead of a 7/8-bit charac-
ter set or encoding
• deliver the “C.UTF-8” locale with the base system
• allow Debian packages to depend on its existence, both at
build and run time
A more controversial solution would be to do the second and third
point of the above with the “en_US.UTF-8” locale, but that would
be favouring US americanism. (On the other hand, it’s *the* one
most widely spread UTF-8 capable locale available, and as such,
the mksh regression tests use it upstream already.)