Re: Unix-ify File Names

To: debian-user@lists.debian.org
Subject: Re: Unix-ify File Names
From: Frank Terbeck <ft@bewatermyfriend.org>
Date: Thu, 19 Apr 2007 21:05:56 +0200
Message-id: <[🔎] 20070419190556.GA2959@fsst.voodoo.lan>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 4627A297.3050508@fgm.com>
References: <[🔎] 20070417073516.GD28724@fsst.voodoo.lan> <[🔎] E1HdwHi-0005ic-92@playground.mcclains.net> <[🔎] 20070418061027.GA23011@fsst.voodoo.lan> <[🔎] 46265D3E.5050007@fgm.com> <[🔎] 20070418201658.GD23011@fsst.voodoo.lan> <[🔎] 4627A297.3050508@fgm.com>

Daniel Barclay <daniel@fgm.com>:
> Frank Terbeck wrote:
>> Daniel B. <REMOVEdanielCAPS@fgm.com>:
>>> Frank Terbeck wrote:
>>>> Mike McClain <mike.mcclain@nowhere.net>:
>>>>> Frank Terbeck <ft@bewatermyfriend.org> wrote:
>>>>>
>>>>>>>  for FILE in `ls *$1` ; do
>>> ...
>>>> b) it breaks on filenames with spaces (and other special characters).
>>> ...>     Using 'for i in `ls *`'-type loops breaks this and is one of the
>>>>     main reasons why people think spaces are bad in filenames.
>>>>     (They are not bad, ...
>>> In what sense are they not bad?  Yes, they're certainly legal per the
>>> filesystem and most tools that take filenames.  However, they and other
>>> special characters do make it more difficult to handle arbitrary file
>>> names.
>> No. They are never bad. It just takes a bit of practice to get used to
>> do things in a robust way.
>
> But some common Unix tools aren't robust enough, in the sense of
> providing consistent escape/encoding mechanisms to handle special
> characters.
>
> For example, Emacs' tags files use commas as delimiters, and (last I
> knew) don't have an escape/encoding mechansim for representing a comma
> _in_ a file name, so (again, last I knew) a Linux kernel file with
> a comma in its name doesn't get processed right.

So? Just because there are programs that limit the namespace of the
files they are working with (which is _absolutely_ okay), does not
mean, that shell scripts must obey to these programs' behaviours. The
shell itself can handle whitespace in filenames just fine. No need to
not use robust techniques, at all. It would be even worse, to use
techniques that will potentially break.

> Some commands do provide fully general mechanisms.  (For example,
> find's -print0 and xargs' -0 option can handle any possible file
> pathname, including one with newline characters.)  However, many
> commands do not.  That typically makes it very difficult to
> handle "special" characters.

Most programs do support filenames with special characters (if they
don't it is clearly a bug). They just depend that the shell gives them
the correct string.

Btw: xargs is not needed if your find binary is reasonably POSIX
compliant. Just use '+' instead of ';' with the -exec option. (Yes, I
know that GNU find didn't support this for quite some time.)

>>> For example, if someone wants to use ls's feature of sorting by date
>>> (e.g., "ls -t *$1"), they cant combine it with the for-loop construct
>>> above (reliably).
>> Okay, I admit that sorting is one of the rare cases where
>> [snip]
>> find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; 
>> do
>>   ...
>> done
>> [snap]
>> or
> ...
>> loops are justified. 
>
> I think you missed my point--the question of how to (or whether one
> can) use for i in `...` to loop over a list of file names that are
> output by some arbitrary program.

You just do not need that. And if you do, it's very hard to get right.
'for i in *' supports _every_ filename you could think of (including
filenames with newlines. Why use something, that is more expensive,
more error-prone and less powerful?

> The particular example of starting with sorting by date with "ls -d"
> has the solution of changing to an entirely different solution (using
> find and sorting as above).

find is just the tool you want to use for recursive actions on files
(or specialized actions, like sorting). find is an external program,
but it does not take a file list as argument, which makes it the
ultimate choice.

> However, what about the general case?
>
> It sounds like for i in `...` doesn't have an escaping/encoding
> mechanism that is sufficient to handle both (unescaped) asterisks
> that represent wildcards and escaped/encoded asterisks that represent
> literal asterists.

I don't think you really understand, what is happening here.

[snip]
% foo='bar\ baz' ; % for i in `echo "$foo"` ; do echo "($i)" ; done
(bar\)
(baz)
[snap]

You _cannot_ escape things there. If you want to know what's going on,
consult the manual of your shell (or the respective POSIX document).
Normally, a section about 'Word Expansions' will describe what happens
in detail.

You see, this is not the type of thing, you want to teach beginners.
Hence, 'for i in `...`' loops should be avoided by beginners (did you
realize, that you dropped 'ls *glob*' from the backtick expression? If
you really know what you are doing, you can get these types of loops
more or less right. But _never_ _ever_ if the command is getting a
file list via globbing.)

>>> Hey, is there any command for taking a filename and escaping/encoding
>>> shell-special characters to make a string that, when parsed by the
>>> shell, specifies that filename?  I'm thinking of something that would
>>> work like this:
>>>
>>>    for i in `encode_for_shell *` ; ...
>> [...]
>> No, that is not how shells work.
>
> Maybe I gave the wrong kind of example (a for loop, which apparently
> doesn't parse and interpreting things enough) for asking about an
> encode command.

The parsing is done to the absolute normal rules of the shell, whether
you use a loop or not does not matter.

> What about when one is building up a command string in a variable,
> say CMD, and then executing the assembled command via "$CMD"?
>
> The string contained in the variable is parsed as a normal command,
> right?  So any logical string values that contain shell-special
> characters needs to be encoded with the usual shell escape-sequence
> syntax, right?
>
> (E.g., if I want to delete a file named "xx*yy", I would have to type
> something like:
>
>     rm xx\*yy
>
> on a manual command line, so if I wanted the command line
>
>     $CMD
>
> to execute that same rm command, CMD would have to contain the
> string "rm xx\*yy" (e.g., set by the command line:
>
>    CMD="rm xx\\*yy"
>
> )
[...]
> Is there any such command (or, say, built-in function)?

It sounds like you are looking for 'eval'.
But this has got noting to do with the original subject.
And this misunderstanding leads me to the conclusion, that you should
read up on how various expansions in POSIX shells work (and probably
on a few common pitfalls, like maximum size of arguments for external
processes, too.);   No offence.

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
                                                  -- RFC 1925

Reply to:

Follow-Ups:
- Re: Unix-ify File Names
  - From: Ken Irving <fnkci@uaf.edu>
- Re: Unix-ify File Names
  - From: Daniel Barclay <daniel@fgm.com>

References:
- Re: Unix-ify File Names
  - From: Frank Terbeck <ft@bewatermyfriend.org>
- Re: Unix-ify File Names
  - From: Mike McClain <mike.mcclain@nowhere.net>
- Re: Unix-ify File Names
  - From: Frank Terbeck <ft@bewatermyfriend.org>
- Re: Unix-ify File Names
  - From: "Daniel B." <REMOVEdanielCAPS@fgm.com>
- Re: Unix-ify File Names
  - From: Frank Terbeck <ft@bewatermyfriend.org>
- Re: Unix-ify File Names
  - From: Daniel Barclay <daniel@fgm.com>

Prev by Date: Re: Debian etch on SONY VAIO VGN-FE880E/H
Next by Date: Re: How to switch from php4 to php5 when upgrading to etch?
Previous by thread: Re: Unix-ify File Names
Next by thread: Re: Unix-ify File Names
Index(es):
- Date
- Thread