[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DEP 5 and directory/file names with spaces



Philipp Kern wrote:
On 2009-06-08, Giacomo A. Catenazzi <cate@cateee.net> wrote:
The <slash> is locale dependent. Thus a file created in an other locales
could contain the character that in current locale is interpreted as
<slash>.
BTW with pathname resolution rules, the file could not be acceded, but
AFAIK the non pathname resolution system call permit <slash>
(like readdir).

[http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap03.html 3.169]
You are linking the old posix specification. On the new, point 3.170, you
see that now <slash> is written between angle parenthesis, to emphasize
that <slash> is to be interpreted as locale dependent character.

So actually I just got hold of the new POSIX specification by using
IEEEexplore at university.  (Is it really true that you can't get it
freely!?)

The html version is free (to access) and still in opengroup.
You can access via http://www.unix.org/2008edition/
(still using html frame, so not so easy bookmarkable, navigable).

3.347 says that <slash> represents the literal character '/', not
something locale-dependent.

But check chapter 6 "Character Set", which define <slash> (in
new and old version)

 Everything else would've been stupid IMHO,
if the old standard is still the only sanely available one and you
suddenly need to care about a different character.

No, it is the same also in the old standard.  The correction was only a
typographical correction, to emphasize that <slash> is locale dependent.

Anyway, I agree that is it stupid, considering that not all charsets
are permitted in POSIX.
I find no charsets that could be used in POSIX locales, in
glibc sources (localedata/charmaps).
I would be happy if Debian forbid such locales. Note: it is
against POSIX (programs and user can define own locale, and
"force" appliacations to use it), but we don't need to be to
much POSIXly correct.

A rationale:
Unlike windows (same value but different glyphs to path separator),
POSIX choose same glyph but different encoding.

But I can also see some good reasons:

POSIX is not a binary specification (like LSB), but a source level
specification (C and shell), thus at text level.
Thus using same glyph is better and more portable.
All code and scripts should work fine with any value of <slash>
(if source and strings are in the correct locale).

The problem arise when importing binary files (like tar). In this
case no charset conversion is done and booom...
But this is not only a problem with slash, but potentially with
other chars (e.g. invalid char sequences in UTF-8).

This is a general problem on binary exchange of binary files: a
fix encoding is need, or the encoding should be provided in
the binary file (and support translating encodings).

I think POSIX huge improved the locale portability, but still
with some problems.
There is only one perfect solution: one single charset
(like UTF-8).

ciao
	cate


Reply to: