[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unix-ify File Names



Frank Terbeck wrote:
Mike McClain <mike.mcclain@nowhere.net>:
Frank Terbeck <ft@bewatermyfriend.org> wrote:

 for FILE in `ls *$1` ; do
...

b) it breaks on filenames with spaces (and other special characters).
...>     Using 'for i in `ls *`'-type loops breaks this and is one of the
    main reasons why people think spaces are bad in filenames.
    (They are not bad, ...

In what sense are they not bad?  Yes, they're certainly legal per the
filesystem and most tools that take filenames.  However, they and other
special characters do make it more difficult to handle arbitrary file
names.

For example, if someone wants to use ls's feature of sorting by date
(e.g., "ls -t *$1"), they cant combine it with the for-loop construct
above (reliably).



Hey, is there any command for taking a filename and escaping/encoding
shell-special characters to make a string that, when parsed by the
shell, specifies that filename?  I'm thinking of something that would
work like this:

   for i in `encode_for_shell *` ; ...

(mapping each argument to a shell string for the argument's value)
or

   for i in `find ... -print0 | xargs -0 encode_for_shell` ; ...

or

   cmd="some_command"
   cmd="${cmd} `encode_for_shell $file_name_with_special_chars`"
   $cmd

(I'm thinking of something like Java's java.util.regex.Pattern.quote(String)
(see
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)
) or Ruby's RegExp::escape(...)
(see http://www.ruby-doc.org/core/classes/Regexp.html#M001216 ), but
escaping/encoding for shell parsing instead of for regular-expression
parsing.)



> some people just do not know how to handle them properly.)

You might not be, but it sounds like you're blaming users.  Sometimes
it's developers of tools (including designers of formats) that don't
have an escape mechanism to handle spaces or other special characters
(or don't provide support for encoding special characters) who are to
blame.


I am aware that there are HOWTOs and other documents out there
that propagate 'for i `ls *foobar*`' loops. I don't know why their
authors do this. If they didn't know better they shouldn't have
written a shell scripting HOWTO in the first place.

Unfortunately for those they mislead, those authors don't know enough
to know they don't know better.  (They must not be the type to dig
into things (e.g., shell syntax) to really understand them, or at
least enough to notice that they don't fully understand them yet.)


Some people use things like this instead:
[snip]
ls * | while read file ; do whatever_command "$file" ; done
[snap]

This is just a little better than the for loop. It still breaks in
some situations.

I see how it would break with a newline character in a file name.
What other cases break?

There is _no_ reason why 'ls' should ever be used to generate file
lists for loops of any kind.

What about things that ls does that the shell's expansion of wildcards
does not do (e.g., sorting by date or size)?

(Maybe ls should have an equilavent to find's "-print0" option.)



Daniel



Reply to: