[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unix-ify File Names



Frank Terbeck wrote:
Daniel B. <REMOVEdanielCAPS@fgm.com>:
Frank Terbeck wrote:
Mike McClain <mike.mcclain@nowhere.net>:
Frank Terbeck <ft@bewatermyfriend.org> wrote:

 for FILE in `ls *$1` ; do
...
b) it breaks on filenames with spaces (and other special characters).
...>     Using 'for i in `ls *`'-type loops breaks this and is one of the
    main reasons why people think spaces are bad in filenames.
    (They are not bad, ...
In what sense are they not bad?  Yes, they're certainly legal per the
filesystem and most tools that take filenames.  However, they and other
special characters do make it more difficult to handle arbitrary file
names.

No. They are never bad. It just takes a bit of practice to get used to
do things in a robust way.

But some common Unix tools aren't robust enough, in the sense of
providing consistent escape/encoding mechanisms to handle special
characters.

For example, Emacs' tags files use commas as delimiters, and (last I
knew) don't have an escape/encoding mechansim for representing a comma
_in_ a file name, so (again, last I knew) a Linux kernel file with
a comma in its name doesn't get processed right.


Some commands do provide fully general mechanisms.  (For example,
find's -print0 and xargs' -0 option can handle any possible file
pathname, including one with newline characters.)  However, many
commands do not.  That typically makes it very difficult to
handle "special" characters.






For example, if someone wants to use ls's feature of sorting by date
(e.g., "ls -t *$1"), they cant combine it with the for-loop construct
above (reliably).

Okay, I admit that sorting is one of the rare cases where

[snip]
find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; do
  ...
done
[snap]

or
...
loops are justified.

I think you missed my point--the question of how to (or whether one
can) use for i in `...` to loop over a list of file names that are
output by some arbitrary program.

The particular example of starting with sorting by date with "ls -d"
has the solution of changing to an entirely different solution (using
find and sorting as above).

However, what about the general case?

It sounds like for i in `...` doesn't have an escaping/encoding
mechanism that is sufficient to handle both (unescaped) asterisks
that represent wildcards and escaped/encoded asterisks that represent
literal asterists.


Hey, is there any command for taking a filename and escaping/encoding
shell-special characters to make a string that, when parsed by the
shell, specifies that filename?  I'm thinking of something that would
work like this:

   for i in `encode_for_shell *` ; ...
[...]

No, that is not how shells work.

Maybe I gave the wrong kind of example (a for loop, which apparently
doesn't parse and interpreting things enough) for asking about an
encode command.

What about when one is building up a command string in a variable,
say CMD, and then executing the assembled command via "$CMD"?

The string contained in the variable is parsed as a normal command,
right?  So any logical string values that contain shell-special
characters needs to be encoded with the usual shell escape-sequence
syntax, right?

(E.g., if I want to delete a file named "xx*yy", I would have to type
something like:

    rm xx\*yy

on a manual command line, so if I wanted the command line

    $CMD

to execute that same rm command, CMD would have to contain the
string "rm xx\*yy" (e.g., set by the command line:

   CMD="rm xx\\*yy"

)

So if I were listing file names (e.g., with file -print0 and maybe some
further filtering with, say grep) and I wanted to assemble a command
that operated on the named files without interpreting any shell-special
characters in the file names when the assembled command line was parsed
by the shell and executed, I would need to map the actual file names to
the shell represention of those file names.

For example, if the list included the name "xx*xx", an encoder could
map that string to the string "xx\*xx" (probably written "xx\\*xx" as
a literal in sh/bash/etc.), which could be appended to the command
string being assembled (and surrounded by whatever separators were
needed to separate it from earlier and later tokens in the command line).

My encode_for_shell command would be applicable to that case.

Is there any such command (or, say, built-in function)?


... But if you are writing real scripts, that are
supposed to work (with data, you potentially don't know in the first
place), you will need to do things in a proper and robust way.

Definitely.


Sorry for the lengthy mail. I hope I could make myself a little
clearer and didn't spread buggy code. :-)

No problem.  I agree with counteracting error-prone suggestions.

Daniel




Reply to: