[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: piping find to zip -- with spaces in path



Dan B. wrote:
> Bob Proulx wrote:
> >So as you can see whitespace isn't safe to use in URLs.  This is
> >basically the same as for Unix filenames.
> 
> They're not quite the same:

Not quite the same is basically the same here.  :-)

The question of the topic was:

  ... what about urls?  They come from the Unix world, and are full of
  underscores and question marks and equal signs.  Then there are
  emails, all of which require the @ sign.  Not complaining, just
  asking.

I think "basically the same" describes things adequately.  People with
a Unix background wouldn't normally include spaces in either of file
names or URLs, or other related "handles" to data.  If you do then
they are much more of a pain to manipulate in shell scripts.  And so
you just don't do it and don't think about whether it is technically
possible or not.  A lot of scripts don't handle whitespace because
there wasn't a need to put the effort into making them handle
whitespace.  They were good enough for the task regardless.

> In URIs, it's not that whitespace "isn't safe to use"; it's simply
> that whitespace is not allowed, period.  (Yes, encodings of whitespace
> characters are allowed, but that encoding still contains no actual
> whitespace characters.)

No.  Actually it was exactly that, "unsafe".  *Exactly* as I said.

  RFC 1738
  "The space character is unsafe because ..."

Literally they are documented as being "unsafe".  Later RFCs have
clarified this somewhat.  But regardless of being unsafe most software
does actually allow them.  (I sometimes see them inappropriately used
in slug lines.)

  wget -O- "http://www.example.com/one two three.html"

And even though the space hasn't been included in the possible
characters RFC 3986 includes this statement:

   Using <> angle brackets around each URI is especially recommended
   as a delimiting style for a reference that contains embedded
   whitespace.

> Additionally, various other syntaxes and protocols build on that
> consistently (e.g., since URIs can never contain space characters,
> HTTP uses space characters as delimiters around URI references).

There is a difference between the URL containing something and
interpreting the start and end of the URL from context.  RFC 3986
describes this in detail.

> Unfortunately, on the other hand, Unix filenames have no
> corresponding specification, at least one that is followed
> consistently.  The kernel and file systems allow spaces, and
> some utilities/commands/scripts/etc. do, but many don't.

The Unix filesystem allows all characters except for the zero
character.  Because the zero character delimits the end of the string
it cannot be used in the string.  And of course the '/' is used as a
directory separator.  If an application doesn't allow other characters
then it is arguably a bug in that application.  (However the
application may document its limitations and stop there.)  Core
utilities will of course be okay but I am sure that fringe
applications have bugs in them.

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: