[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: how to write a script that recursively check files in a directory with md5sum



John Summerfield wrote:

Matthias Czapla wrote:

On Thu, Jul 15, 2004 at 09:05:54AM +0800, John Summerfield wrote:
I don't use -exec on find any more because it's slow. When you pipe the names into xargs as I do, then spaces cause the problem I described.


Well, until now I didnt even know about xargs' purpose, thanks for the
pointer.

For slowness, consider this:
summer@Dolphin:~$ find ~ -type f | wc -l
886076
summer@Dolphin:~$ find ~ -type f -print0 | xargs -0 | wc -l
3990


You're right, xargs is faster (60 times in the case of ls), but only if
the actual command isn't doing very much. For md5sum there is practically
no difference in speed (have just done some measurements).

I think the results will depend....
md5sum isn't the smallest binary around, but there are larger ones too.

If md5sum gets cached, that's RAM you can't use for something else for a while. If not, it may be larger than the files you're handling.

If you're processing a lot of small files, the difference will be huge. If you're processing 700 Mbyte ISOs or 9.4 Gbyte DVD images, the difference will be immeasurably small.



BTW, what are all those files in your home directory? I have only
about 14000 and thought that this is the biggest mess ever ;)
Oh, stuff. source of debs, built and otherwise. CVS checkouts of stuff. Documents. Photos (see my sig for some). IBM operating systems.

Lotsa stuff. 12 Gbytes of stuff. Too much stuff.

Since I'm kinda new at this I just have to ask what's wrong with a
for-loop..
To slow?
I have no idea, but I use something like this to recurse down a tree and
do something with every file:

#!/bin/bash
for i in `find -type f`
do
     whatever you wan't to do, just use $i instead of the filename.
done

Regards
Sturla



Reply to: