Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

To: Andrew McMillan <andrew@morphoss.com>, 522776@bugs.debian.org
Cc: Adeodato Simó <dato@net.com.org.es>
Subject: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
From: "Giacomo A. Catenazzi" <cate@debian.org>
Date: Wed, 08 Apr 2009 10:15:27 +0200
Message-id: <[🔎] 49DC5D1F.4020203@debian.org>
Reply-to: "Giacomo A. Catenazzi" <cate@debian.org>, 522776@bugs.debian.org
In-reply-to: <[🔎] 1239139060.4580.51.camel@happy.mcmillan.net.nz>
References: <[🔎] 20090406120655.27815.2545.reportbug@lenny.mirbsd.org> <[🔎] 49DA0B6A.7060107@debian.org> <[🔎] Pine.BSM.4.64L.0904061727410.28766@herc.mirbsd.org> <[🔎] 20090406180917.GA23092@dario.dodds.net> <[🔎] 20090407203246.GA4158@chistera.yi.org> (sfid-20090408_084041_081980_2AF2E1E2) <[🔎] 1239139060.4580.51.camel@happy.mcmillan.net.nz>

Andrew McMillan wrote:

On Tue, 2009-04-07 at 22:32 +0200, Adeodato Simó wrote:

It is my impression that more packages than mksh could use an UTF-8
locale at build time (I’m afraid I don’t have pointers, but I’m sure
I’ve come across at least a couple).

Wouldn’t it be just better to change Debian’s default to make an UTF-8
locale available by default, rather than to force all those packages to
play tricks with LOCPATH?


I too would really like to see a UTF-8 locale available by default, and
would prefer to see this be the C.UTF-8 locale, which doesn't screw with
the collation / character type settings like any other UTF-8 locale
would.

It seems to me that the consensus here is that having a UTF-8 locale
available is a good idea and I don't hear any very strong argument
against such a change.

Consequently I think we should move on from the discussion and start
working out a patch to resolve this in policy.


So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?

It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing the various meanings.
And until I understand the problem, I cannot propose a solution.

- terminals should be sensible to charsets, on choosing how to display
  things
- programs should be sensible to locales (topic of this discussion):
  the locales provides some charsets dependent strings, and interpretation
  of the various characters, but (usually) they MUST NOT translate characters.

Anyway:

The locale C is already a UTF-8 compatible locale.
No? so what it misses?
- other alphabetic, numeric, currency, whitespace characters?  But not UTF-8
  local provides all characters: they define only the needed range for the
  language [see wikipedia, which should code UTF-8 as binary for this reason].
  The "C" "spoken" language require only ASCII-7 (or maybe only a subrange of it).
  So why we need further characters?
  Note: whitespace are restricted in "C" locale by POSIX, in only two values

  We could use charset UTF-8 for C locale, declaring unused/illegal all
  c > 127.  Whould this solve the problems with mksh? I don't think so,
  so what you need in this C.UTF-8?

I still think that "en_US.UTF-8" is the right default (note:
I'm not a US citizen, nor I speak English).

The installation will install the correct locale, so the en_US period is very
short (we'll dominate them ;-) ).

On debootstrap/pbuild/... things are different.  But if it this the problem,
let check a solution for building environment (and I still think that in this
env "en_US.UTF-8" could be nice.

But I'll prefer a simple basic ASCII-7 "C" for basic/plain build, and only
after packager thinks if it is a bug or a feature to have a specific build with
UTF-8, it should manually set it.
Why build need to depend to a locale?
UNIX way is to allow to compile things for remote (maybe other OS, other arch)
system.
For testing? So why not test various locales (UTF-8, but also other non
ascii based encodings)

ciao
	cate

Reply to:

Follow-Ups:
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Andrew McMillan <andrew@morphoss.com>

References:
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: "Giacomo A. Catenazzi" <cate@debian.org>
- Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Thorsten Glaser <tg@mirbsd.de>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Steve Langasek <vorlon@debian.org>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Adeodato Simó <dato@net.com.org.es>
- Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
  - From: Andrew McMillan <andrew@morphoss.com>

Prev by Date: Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Next by Date: Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Previous by thread: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Next by thread: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Index(es):
- Date
- Thread