Bug#99933: second attempt at more comprehensive unicode policy

To: Richard Braakman <dark@xs4all.nl>, 99933@bugs.debian.org
Subject: Bug#99933: second attempt at more comprehensive unicode policy
From: Colin Walters <walters@debian.org>
Date: 06 Jan 2003 00:21:27 -0500
Message-id: <[🔎] 1041830487.15092.20.camel@space-ghost>
Reply-to: Colin Walters <walters@debian.org>, 99933@bugs.debian.org
In-reply-to: <[🔎] 20030106030032.GA1754@night>
References: <[🔎] 1041533855.15063.19.camel@space-ghost> <[🔎] 1041546314.22038.9.camel@space-ghost> <[🔎] 20030103231158.GB8502@tatonka.pfalz.de> <[🔎] 1041648625.21808.28.camel@space-ghost> <[🔎] 87isx4q588.fsf@orcus.priv.at> <[🔎] 1041700241.32717.35.camel@space-ghost> <[🔎] 20030105142317.GB1699@zobe.linuxfr.org> <[🔎] 1041786548.9879.8.camel@space-ghost> <[🔎] 20030105201303.GA23475@zobe.linuxfr.org> <[🔎] 1041819155.14620.9.camel@space-ghost> <[🔎] 20030106030032.GA1754@night>

On Sun, 2003-01-05 at 22:00, Richard Braakman wrote:
> On Sun, Jan 05, 2003 at 09:12:36PM -0500, Colin Walters wrote:
> >  However, if these programs display
> > them to the user on a tty, it will be necessary to convert them to the
> > user's locale encoding
> 
> Hmm.  Remember the far more common case of a program that takes a
> filename on the command line and then tries to open it.  The user
> would have typed it in the local encoding, so it needs conversion.

That's true.  Hm.  Maybe the best approach will be to first just
implement Unicode and UTF-8 support for more programs, so it is how they
handle filenames (and strings in general) internally, much like how
GNOME programs do it now.  This is all well and good, I think.  

The bigger question is what to do for programs that create or rename
files, especially from user input.  Should they try to convert filenames
back into the locale encoding?  I would say no, because 1) it could fail
if the locale encoding can't encode certain characters and 2) it will
just prolong the brokenness.  For programs like 'touch' though which do
not look at the filename at all, I think they should not be changed at
all.  They will create a file named using the same encoding given to it
as an argument.

After we have a "sufficient" number of programs supporting UTF-8
natively in this way, we change the policy on filenames to a "must",
drop support for legacy terminals and encodings, and switch everyone to
a UTF-8 terminal, and a UTF-8 locale.

My guess is that this could happen some time after sarge's release.  For
sarge, we could (and probably should) make the default locale for new
installations be UTF-8.  After we've switched to a UTF-8 locale for
everyone, programs will no longer need the code to handle legacy
encodings.  It will probably still be useful to keep it though, because
the legacy encodings will be around for a long time, and we want things
to Just Work as much as possible.

So again, after this current policy proposal is accepted, it will still
not be a RC bug to not have UTF-8 support; but people will know that it
is coming.

What do you think?

Reply to:

Follow-Ups:
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Jochen Voss <jvoss2@web.de>

References:
- Re: Bug#174982: [PROPOSAL]: Debian changelogs should be UTF-8 encoded
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Jochen Voss <jvoss2@web.de>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Robert Bihlmeyer <robbe@orcus.priv.at>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: barbier@linuxfr.org (Denis Barbier)
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: barbier@linuxfr.org (Denis Barbier)
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Colin Walters <walters@debian.org>
- Bug#99933: second attempt at more comprehensive unicode policy
  - From: Richard Braakman <dark@xs4all.nl>

Prev by Date: Bug#99933: second attempt at more comprehensive unicode policy
Next by Date: Bug#99933: second attempt at more comprehensive unicode policy
Previous by thread: Bug#99933: second attempt at more comprehensive unicode policy
Next by thread: Bug#99933: second attempt at more comprehensive unicode policy
Index(es):
- Date
- Thread