[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: how to write a script that recursively check files in a directory with md5sum



Matthias Czapla wrote:

On Thu, Jul 15, 2004 at 09:05:54AM +0800, John Summerfield wrote:
I don't use -exec on find any more because it's slow. When you pipe the names into xargs as I do, then spaces cause the problem I described.

Well, until now I didnt even know about xargs' purpose, thanks for the
pointer.

For slowness, consider this:
summer@Dolphin:~$ find ~ -type f | wc -l
886076
summer@Dolphin:~$ find ~ -type f -print0 | xargs -0 | wc -l
3990

You're right, xargs is faster (60 times in the case of ls), but only if
the actual command isn't doing very much. For md5sum there is practically
no difference in speed (have just done some measurements).

I think the results will depend....
md5sum isn't the smallest binary around, but there are larger ones too.

If md5sum gets cached, that's RAM you can't use for something else for a while. If not, it may be larger than the files you're handling.

If you're processing a lot of small files, the difference will be huge. If you're processing 700 Mbyte ISOs or 9.4 Gbyte DVD images, the difference will be immeasurably small.



BTW, what are all those files in your home directory? I have only
about 14000 and thought that this is the biggest mess ever ;)
Oh, stuff. source of debs, built and otherwise. CVS checkouts of stuff. Documents. Photos (see my sig for some). IBM operating systems.

Lotsa stuff. 12 Gbytes of stuff. Too much stuff.




--

Cheers
John

-- spambait
1aaaaaaa@computerdatasafe.com.au  Z1aaaaaaa@computerdatasafe.com.au
Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/



Reply to: