[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: character encoding



Where does encoding come in to play in the handling of file names?  The kernel, I assume, just sees byte sequences, right?  When you interact with a terminal, or other software, you must enter a filename and hope you are matching the encoding under which the file name was created, or it won't match the byte sequence when the unterlying system call is made . . . is this an accurate description of the situation?

On Dec 31, 2007 9:52 PM, Vincent Lefevre <vincent@vinc17.org> wrote:
On 2007-12-31 15:08:24 -0800, Kelly Clowers wrote:
> On Dec 31, 2007 1:41 PM, ChadDavis <chadmichaeldavis@gmail.com> wrote:
> > 3) What is the encoding of the file name?  Is this a feature of the
> > filesystem?
>
> This is also based on your locale.

And this is nasty: This means that if the user changes his locales
(or use different locales depending on the context), he will get
buggy filenames; this is also the case with system scripts that run
under the C locale. Also, different users using different locales
won't easily be able to share files.

Workaround 1: don't use non-ASCII characters in filenames. This
may not be very user-friendly, but this is 100% compatible with
everything.

Workaround 2 (if ASCII isn't sufficient): always use UTF-8. But be
careful about the normalization problems (NFC/NFD...). Linux can't
handle that, so that you may get several files with the same name
(but encoded differently) in the same directory.

--
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: < http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: