Bug#99933: second attempt at more comprehensive unicode policy

To: Colin Walters <walters@debian.org>
Cc: 99933@bugs.debian.org
Subject: Bug#99933: second attempt at more comprehensive unicode policy
From: starner@okstate.edu
Date: Wed, 15 Jan 2003 19:30:57 -0600 (CST)
Message-id: <[🔎] 521777529.1042680657708.JavaMail.root@dexter.okstate.edu>
Reply-to: starner@okstate.edu, 99933@bugs.debian.org

>On Tue, 2003-01-14 at 21:50, starner@okstate.edu wrote:
>
>> And? A POSIX filename is not a string of characters, it's a string 
>> of bytes. You have no technical need to differentiate between the
>> two.
>
>If you do any sort of character-oriented manipulation on those names,
>you will.

Like what? How much character-oriented manipulation are you going 
to be doing on the whole system? When you're playing with your 
own files, you don't have a problem. How much fine manipulation
are you going to be doing with someone else's files?

>> Good. It reminds me not have filenames that I have no way of entering
>> into the computer.
>
>Well, that may be fine for you, but can you say it's fine for everyone
>in the world?

How many people in the world, who don't speak CJK, want filenames in
Chinese ideographs? I'm a languages geek - I own dictionaries from
languages I don't know, to languages I don't know. I still don't want 
random ideographs in filenames on my system. My parents? my family? They
might have to call me in for tech support. I don't know anyone who 
doesn't speak CJK would want it.

And if you want to fix it, that's easy. Switch to a UTF-8 locale.

>Well, hopefully most shell scripts would not be directly referencing the
>files on the system, so they will continue to work.

And what if they are? Are you going to tell me that shell scripts cannot
reference an arbitary filename on the system?

>Now, this is interesting.  I had thought that the general consensus in
>the free software community at large was that UTF-8 is the only sane
>charset for filenames, and to not attempt complete support for filenames
>in the locale charset.  At least this is quite obviously the position
>taken by GNOME.  Do you have any suitable references for projects which
>take a different appproach?

Every one else? I don't know of an example besides GNOME that regards
filenames as UTF-8 by default -- everyone else just treats them as
locale. It would add a lot of code to some programs to do otherwise.

>By <middle dot> I'm assuming you mean U+00B7 '?'.  It seems to me that
>in the chain above, Program 1 is a trusted program; it is doing
>validation on network input.  So it is a bug in that program, or its
>configuration, for it to execute any programs which might do something
>untrusted.

What programs convert from locale charset to UTF-8 for filenames, 
or vice versa? When? Unless you can clearly and unambigiously state 
when that happens, and even if you do, this problem will pop up.

Reply to:

Prev by Date: Bug#176506: Make debconf mandatory for prompting the user
Next by Date: Bug#99933: third attempt at more comprehensive unicode policy
Previous by thread: Bug#99933: third attempt at more comprehensive unicode policy
Next by thread: Bug#176300: Please use a better From: line for CVS messages.
Index(es):
- Date
- Thread