Bug#99324: Default charset should be UTF-8

To: Marco d'Itri <md@Linux.IT>, 99324@bugs.debian.org
Subject: Bug#99324: Default charset should be UTF-8
From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
Date: Fri, 1 Jun 2001 14:45:12 +0200
Message-id: <[🔎] 20010601144512.A28881@melkor.dnp.fmph.uniba.sk>
Reply-to: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>, 99324@bugs.debian.org
In-reply-to: <[🔎] 20010601135642.B1987@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 01:56:42PM +0200
References: <[🔎] 20010601133112.B7815@cibalia.gkvk.hr> <[🔎] 20010601135642.B1987@wonderland.linux.it>

On Fri, Jun 01, 2001 at 01:56:42PM +0200, Marco d'Itri wrote:
> On Jun 01, Josip Rodin <joy@cibalia.gkvk.hr> wrote:
> 
>  >Nice things these general tendencies... in my country we still have problems
>  >using ISO 8859-2 because Windows 1250 has polluted everything. Adding
>  >another one to the pile is likely to screw things up even more. <sigh>
> This is the reason we can't just switch the terminals to UTF-8, there
> are way too many programs which can't correctly recode ISO-8859-* text,
> because they are broken or because the charset is unlabeled.

so we first make them work with ISO-8859-*, then work on making applications
work with UTF-8, then work on making those terminals display UTF-8?
I can see a shortcut here...

> Let's first fix the software, then we'll talk about using UTF-8 by
> default for everybody.

but fix it which way?
To support CP1250? ISO-8859-2? CP852?
Or KOI8-R? CP1251? EMCA? CP866?

better concentrate on UTF-8, it indeed does solve many
problems
better let the old terminals die... if we had not let
the old Czechoslovak Kamenicky encoding to die, but 
focused instead on fixing the software, we would have much
bigger mess than there is now.

All the i18n stuff in glibc is a bit flawed... it assumes you
NEVER want to change the default locale while the program is running,
and it assumes everybody has correct terminal.

Have you seen konwert package? It is really nice, glibc's
iconv(3) should have been like this and there would be one problem less...

Ideal (under the circumstances) would be:

have glibc work internaly in UTF-8 unconditionaly
Output is transliterated according to terminal charset
(ideally UTF-8, so no conversion is necessary).
Terminal charset can be switched over _on the fly_, maybe
via SIG-SOMETHING to glibc

locales are in UTF-8 unconditionaly
isprint(3) returns 1 for UTF-8 characters (fuzzy here... but it 
definitely should not be tied to locale), actual displaying the character
is handled by konwert-like output routine

You want your ISO-8859-1 console with locales?
No problem, do export OUTPUTCHARSER=ISO-8859-1 and glibc
will transliterate eventual russian fortunes into latin script...
and strip diacritics from Slovak names.

readline, stty & co. are UTF-8 aware
input can be recoded to UTF-8 if necessary, but ideally it is
already coming in UTF-8 (the biggest problems are text editors here, maybe
use filterm(1) for old applications)
Input encoding, too, can be changed on the fly.

have X work in UTF-8
xkb sends UTF-8, default font encoding is UTF-8

allow UTF-8 in /etc/passwd. Damn. I was bitten by this a few days ago.
8-bit chars in GECOS behave unpredictably

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Reply to:

Follow-Ups:
- Bug#99324: Default charset should be UTF-8
  - From: Marco d'Itri <md@Linux.IT>

References:
- Re: Bug#99324: Default charset should be UTF-8
  - From: Josip Rodin <joy@cibalia.gkvk.hr>
- Bug#99324: Default charset should be UTF-8
  - From: Marco d'Itri <md@Linux.IT>

Prev by Date: Bug#99324: Default charset should be UTF-8
Next by Date: Re: Bug#99324: Default charset should be UTF-8
Previous by thread: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99324: Default charset should be UTF-8
Index(es):
- Date
- Thread