[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale



Thorsten Glaser wrote:
Giacomo A. Catenazzi dixit:

I think you misunderstand the mksh part of the problem.

mksh has two modi: a legacy mode, in which it does not make any
assumptions about charsets or encodings and is 8-bit clean and
mostly 8-bit transparent, safe a few mostly past bugs and imple-
mentation shortcomings, and a unicode mode, in which it assumes
its input is UTF-8 (although, with ^V, you can still enter non-
UTF-8 sequences, and tabcomplete filenames in legacy encodings
as well). The unicode mode is enabled with "mksh -U" or "set -U".
However, mksh has a feature which automatically enables the uni-
code mode if
- the current CODESET is UTF-8 (or the locale ends in .utf8 or
  .UTF-8 or something similar, in some cases), or
- the input begins with a UTF-8 BOM.

This is good way to do things!


The regression test suite merely checks for this feature. To do
so, it needs a way to set the checked mksh process' CODESET to
UTF-8, which is only possible by setting a non-C/POSIX locale.

This means that we make few automatic regression tests ;-)

But so, the UTF-8 requirement are a lot narrow than the
rest of discussion.

I think that we should provide some package that give pbuilder
environment a UTF-8 environment. Or a debhelper (or like) utility
that "construct it" for build needs.

In this case "us_EN.UTF-8" is a sensible locale (we want to test
a real locale), but in this case I would also test some UTF-16
or Asian locale (mksh should not assume UTF-8 in these cases).

You had already a solution (but embedding in a standard utility
is IMHO better, which hide the complexity, and show direct what
you need).  BTW the locale could be also a pathname, so
no root power needs (i.e. for other tests in user gleba).


Andrew McMillan dixit:

The proposal, at this stage is only that the C.UTF-8 locale is
*installed* and *available* by default.  Not that it *be* the default,
but that it *be there* as a default.

This is about what I was to propose, indeed.

I agree that we must provide by default also a UTF-8, but I don't
like "C.UTF-8".  A solution: force all locales to have also the
UTF-8 "brother", and force installation of such locale when user
choose (at installation time) a non UTF-8 locale.

"C" is not offered at installation time (but IIRC KDE offered
at first run, some versions ago).

For building env I prefer a "us_EN.UTF-8" (we need English to
read logs) or build when needed (better because probably
we need other locales to test, and probably some packages
needs some Asian locale for building/testing)



Andrew McMillan dixit:

Once this minimum step is made, and we've all calmed down, we can think
further on radical and dramatic changes over coming years where more
significant shifts are made, like:

* The default locale at installation is C.UTF-8 rather than C.

That would be nice.

C is not the default locale. "en_US.UTF-8" is the default
(d-i of lenny, pressing only ENTERs).


Andrew McMillan dixit:

[...] and indeed Steve
Langasek has already suggested a seemingly reasonable workaround for the
immediate problem which was, funnily enough, that mksh wants to have a
UTF-8 locale *available* in order for it to *test the build*...

Yes, his suggestion and searching for someone to actually use it
(Daniel Jacobowitz does) helped that part of the problem. However,
the mksh regression test suite is only one of the manifestations.
Even as a mere user, I'd like to have, see above, a UTF-8 locale
available and, if possible, default. Well, maybe not a UTF-8 locale,
just UTF-8 encoding (especially when I ssh from a MirBSD system to
a Debian system, since on MirBSD there is *only* UTF-8¹), but glibc
defines encodings exclusively via locales, which is why I'm in fa-
vour of C.UTF-8 for myself, but setting LC_CTYPE only has the same
effect (and I often set LC_MESSAGES to en_GB.UTF-8 for gcc's bene-
fit).

But your case is very specific (to building package). And
in these case we want a minimal build environment.
Additionally it is for testing purpose, so you test UTF-8,
other package maybe needs other locales.

Anyway I agree that a UTF-8 locale could be installed by default
(also on pbuilder), but I we need also a locale utility for
debian/rules, and that user has the right UTF-8 locale
(so for a generic user, not C.UTF-8, but xz_YW.UTF-8,
if is normally using xz_YW)

ciao
	cate


Reply to: