[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale



Andrew McMillan wrote:
On Tue, 2009-04-07 at 22:32 +0200, Adeodato Simó wrote:
It is my impression that more packages than mksh could use an UTF-8
locale at build time (I’m afraid I don’t have pointers, but I’m sure
I’ve come across at least a couple).

Wouldn’t it be just better to change Debian’s default to make an UTF-8
locale available by default, rather than to force all those packages to
play tricks with LOCPATH?

I too would really like to see a UTF-8 locale available by default, and
would prefer to see this be the C.UTF-8 locale, which doesn't screw with
the collation / character type settings like any other UTF-8 locale
would.

It seems to me that the consensus here is that having a UTF-8 locale
available is a good idea and I don't hear any very strong argument
against such a change.

Consequently I think we should move on from the discussion and start
working out a patch to resolve this in policy.

So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?

It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing the various meanings.
And until I understand the problem, I cannot propose a solution.

- terminals should be sensible to charsets, on choosing how to display
  things
- programs should be sensible to locales (topic of this discussion):
  the locales provides some charsets dependent strings, and interpretation
  of the various characters, but (usually) they MUST NOT translate characters.

Anyway:

The locale C is already a UTF-8 compatible locale.
No? so what it misses?
- other alphabetic, numeric, currency, whitespace characters?  But not UTF-8
  local provides all characters: they define only the needed range for the
  language [see wikipedia, which should code UTF-8 as binary for this reason].
  The "C" "spoken" language require only ASCII-7 (or maybe only a subrange of it).
  So why we need further characters?
  Note: whitespace are restricted in "C" locale by POSIX, in only two values

  We could use charset UTF-8 for C locale, declaring unused/illegal all
  c > 127.  Whould this solve the problems with mksh? I don't think so,
  so what you need in this C.UTF-8?

I still think that "en_US.UTF-8" is the right default (note:
I'm not a US citizen, nor I speak English).

The installation will install the correct locale, so the en_US period is very
short (we'll dominate them ;-) ).

On debootstrap/pbuild/... things are different.  But if it this the problem,
let check a solution for building environment (and I still think that in this
env "en_US.UTF-8" could be nice.

But I'll prefer a simple basic ASCII-7 "C" for basic/plain build, and only
after packager thinks if it is a bug or a feature to have a specific build with
UTF-8, it should manually set it.
Why build need to depend to a locale?
UNIX way is to allow to compile things for remote (maybe other OS, other arch)
system.
For testing? So why not test various locales (UTF-8, but also other non
ascii based encodings)

ciao
	cate




Reply to: