[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unix-ify File Names



Daniel B. <REMOVEdanielCAPS@fgm.com>:
> Frank Terbeck wrote:
>> Mike McClain <mike.mcclain@nowhere.net>:
>>> Frank Terbeck <ft@bewatermyfriend.org> wrote:
>>>
>>>>>  for FILE in `ls *$1` ; do
> ...
>> b) it breaks on filenames with spaces (and other special characters).
> ...>     Using 'for i in `ls *`'-type loops breaks this and is one of the
>>     main reasons why people think spaces are bad in filenames.
>>     (They are not bad, ...
>
> In what sense are they not bad?  Yes, they're certainly legal per the
> filesystem and most tools that take filenames.  However, they and other
> special characters do make it more difficult to handle arbitrary file
> names.

No. They are never bad. It just takes a bit of practice to get used to
do things in a robust way.

> For example, if someone wants to use ls's feature of sorting by date
> (e.g., "ls -t *$1"), they cant combine it with the for-loop construct
> above (reliably).

Okay, I admit that sorting is one of the rare cases where

[snip]
find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; do
  ...
done
[snap]

or

[snip]
IFS='
'
for i in `find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2` ; do
  ...
done
[snap]

loops are justified. At least in POSIX shell. I really didn't think of
sorting in my original mail. Thanks for noting. (But still you don't
use broken for loops.)

Note, that the for loop does _not_ use an external program with
globbing. And it only works with spaces, because of the changed $IFS
parameter.  This may lead to unexpected results if it is not reset to
it's old value inside of the loop.

However, Bash, ksh and zsh users may still overcome this:

[snip]
oifs="$IFS"
IFS='
'
set -- x $(find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2)
IFS="$oifs"
shift
while [ -n "$1" ] ; do
  echo file: "$1"
  shift
done
[snap]

This will _not_ work in a pure POSIX shell like dash, as it only
permits 10 positional parameters; those shells will indeed have to
used a while loop fed by find(1) (like I noted above).

Of course, this breaks with newline characters in filenames, but
newlines are really uncommon (probably on left on a system by users
who don't want their files to be deleted. :-)).

And in zsh, you would actually do:
[snip]
for i in **/*(om) ; do foobar $i ; done
[snap]

Yes, zsh does recursive globbing and lets you define the sorting of
the generated file list.

Its really a pity that find(1) does not allow sorting by itself (and
if it was only by a handful of criteria).

But we are slowly leaving the topic, here. I just wanted to make sure
that beginners are not confronted with problematic for-loop constructs
like in the first mail I was replying to. Manipulating $IFS is
probably not something to confront beginners with either.

> Hey, is there any command for taking a filename and escaping/encoding
> shell-special characters to make a string that, when parsed by the
> shell, specifies that filename?  I'm thinking of something that would
> work like this:
>
>    for i in `encode_for_shell *` ; ...
[...]

No, that is not how shells work.
Just to repeat this once and for all:
_Never_ do 'for i in `ls *`'. Never. It's broken.

> > some people just do not know how to handle them properly.)
>
> You might not be, but it sounds like you're blaming users.  Sometimes
> it's developers of tools (including designers of formats) that don't
> have an escape mechanism to handle spaces or other special characters
> (or don't provide support for encoding special characters) who are to
> blame.

Well, the shell is really really old. It has its flaws. That is why it
is not that easy to use and understand for beginners. Especially, if
they are taught how to do things wrong, that often. I admit that it
can be quite difficult to do things right[tm]. I'm making mistakes
when scripting in 'sh' all the time (at least if the script is a
little more than trivial).

[...]
>> Some people use things like this instead:
>> [snip]
>> ls * | while read file ; do whatever_command "$file" ; done
>> [snap]
>> This is just a little better than the for loop. It still breaks in
>> some situations. 
>
> I see how it would break with a newline character in a file name.
> What other cases break?

Broken aliases.
Too long argument lists. Yeah, 'ls | while ...' does not have the
argument problem, but as soon as you start globbing, it's there.

>> There is _no_ reason why 'ls' should ever be used to generate file
>> lists for loops of any kind.
>
> What about things that ls does that the shell's expansion of wildcards
> does not do (e.g., sorting by date or size)?
>
> (Maybe ls should have an equilavent to find's "-print0" option.)

In these cases, you use find(1) (in conjunction with other standard
tools, like sort, cut etc.).


Please note, that what I am writing here are no must-dos, of course.
I do not intend to attack anybody. I mean, there are people who know
POSIX shell scripting far better than I do, so who am I to judge
others? But 'for i in `ls *`' is really annoyingly wrong, even in my
eyes. :-)

So, sometimes, when you are writing one-liners, at the shell-prompt,
and you know, what data you are dealing with, you can do whatever
works the quickest. But if you are writing real scripts, that are
supposed to work (with data, you potentially don't know in the first
place), you will need to do things in a proper and robust way.

Sorry for the lengthy mail. I hope I could make myself a little
clearer and didn't spread buggy code. :-)

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
                                                  -- RFC 1925



Reply to: