[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#99933: second attempt at more comprehensive unicode policy



On Mon, 6 Jan 2003, Richard Braakman wrote:

> I guess this conversion should be done by the user's shell, and all
> filename arguments on the command line should be encoded in UTF-8.
> Umm, except that the shell doesn't know which arguments are filenames.
> How should this be done?

I think you'd need to have all of argv be converted to utf-8 by the shell.

IMHO it can't work any other way. If for instance you have a directory
with some chinese utf-8 filenames and you do:

ls <typed filename in latin-1> * 

The only way ls ever has a hope of working is if it expects all of argv to
be utf-8. Basically, I don't see any way that ls could ever hope to do
automatic conversion and have a program that works in all cases. The shell
must do it, because only the shell knows the source encoding for each
argument, and the only character coding that the shell could use to pass
the information to the program is unicode. 

The problem of output is further complicated, consider for instance:

find -type f
find -type f | xargs ls

With what I just said. The first find must know it is talking to a
terminal that is not UTF-8 and do conversion, the 2nd must know it is
talking to a pipe and only output UTF-8.

Frankly, I think it's unworkable to try and make individual programs
responsible for character conversion, except when processing files that it
knows for certain are in a special locale. The way forward must be to
implement UTF-8 at the terminal/pty and make that work as well as possible
for everyone concerned.

Jason



Reply to: