[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DEP 5 and directory/file names with spaces



I'd suggest for readability/maintainability (especially for those with
editors that might mask characters like these) to have some of the
characters as part of filenames escaped in the usual form--

TAB becomes \t
CR becomes \r
LF becomes \n

etc.

I think perhaps too many escapes (backslashes) would add a great deal
of noise to strings, so something that avoids it and is more human
readable (while not sacrificing machine readability) is a great
feature to me.

Another thing is that diffs will work as expected, because there is
only one filename/file mask per line. If there are more, then it is a
slightly (very slight) cognitive load for us to determine which file
has changed, given a line of changes.

So if you replace one file with another using my idea, the diff would
look like this:
--- a   2009-06-08 19:58:16.000000000 -0400
+++ b   2009-06-08 19:58:36.000000000 -0400
@@ -1,6 +1,6 @@
 Files: a
  b c d
- e f.txt
+ i j k
  g h.cfg
  i
  k

And if you use the escape-spaces and use commas format as before, then
it would look like this:
--- a   2009-06-08 20:00:03.000000000 -0400
+++ b   2009-06-08 20:00:17.000000000 -0400
@@ -1 +1 @@
-Files: a b\ c\ d e\ f.txt g\ h.cfg i k lj
+Files: a b\ c\ d i\ j\ k g\ h.cfg i k lj

While the latter format is significantly more compact, it's also much
less readable and therefore much more prone to errors. Given the
infrequency of newlines in filenames, I think that providing an escape
sequence for them would be appropriate, and easier than dealing with
escaping each space. Anecdotal evidence would seem to suggest that
spaces and commas (which would need to be escaped) are used much more
frequently than newlines in filenames.

Granted, commas are probably less frequently used (except in a few
domain-specific uses, like CVS which uses it to store versioned files
as filename,v).

Of the three (commas, spaces, newlines) -- I think newlines are
probably used least often in filenames, and would therefore be most
appropriate as line separators.

Also, the examples mentioned above are simplistic -- but if you get
fancy with the backslashes then it can get really really confusing to
follow. While it's nice to be able to do 'ls' of those particular
files (or whatever else).. I don't think globbing ability should be
the *primary* goal. After all, we can always write a simple Perl
script to do that sort of thing (read the Files, output a glob list
which we can pass to ls).

Cheers,

Jonathan

On Mon, Jun 8, 2009 at 7:52 PM, Gunnar Wolf<gwolf@gwolf.org> wrote:
> Jonathan Yu dijo [Mon, Jun 08, 2009 at 07:35:56PM -0400]:
>> Since nobody seems to have noticed, I'd like to re-propose my idea for
>> consideration:
>>
>> Files: a b
>>  c d
>>  e
>>  f
>>
>> (ie, using continuation lines to specify lists of files, rather than
>> commas or anything else. No escaping necessary.)
>
> Yup - But the newline is also a valid (altough, yes, very uncommon)
> part of a filename.
>
> Now, this proposal keeps the field RFC822-ish — We could extrapolate
> this a bit, and accept basically any non-whitespace strings delimited
> by whitespace. Newlines are just one form of whitespace. And, of
> course, you can escape any whitespace character to prevent it from
> being treated as whitespace.
>
> --
> Gunnar Wolf - gwolf@gwolf.org - (+52-55)5623-0154 / 1451-2244
> PGP key 1024D/8BB527AF 2001-10-23
> Fingerprint: 0C79 D2D1 2C4E 9CE4 5973  F800 D80E F35A 8BB5 27AF
>


Reply to: