Re: Delete 4 million files
On Wed, Mar 25, 2009 at 03:10:42PM +0000, Tzafrir Cohen (tzafrir@cohens.org.il) wrote:
> $ for i in `seq 4000000`; do echo something quite longer; done | xargs
> /bin/echo | wc
> 756 12000000 92000000
[...]
> So it's indeed not 4M processes, but still quite a few.
Even 756 is much less than 4M.
> But wrost:
> you're traversing the directory many times. And you're telling rm in
> which explicit order to remove files, rather than simply the native
> order of the files in the directory (or whatever is convinient for the
> implementor). Which probably requires rm a number of extra lookups in
> the directory.
Interesting point; I hadn't thought of that.
How much fork() costs in comparison to reading
a directory entry, well that'd depend on things like
disk and cpu speed, available memory, filesystem type &c.
To get an idea which way it falls I did a quick test with 500k files
(created by seq 500000 | xargs touch) on my box.
First on an ext3 filesystem:
rm -rf testd 4m11.909s
find testd -type f |xargs rm 4m42.025s
find testd -type rm -exec rm {} \; 62m59.030s
find testd -type f -delete 4m19.340s
Then on tmpfs:
rm -rf testd 0m2.507s
find testd -type f |xargs rm 0m6.318s
find testd -type rm -exec rm {} \; 58m34.645s
find testd -type f -delete 0m3.362s
So, it would seem the number of rm calls indeed dominates
the time needed, not directory traversal.
Of course here xargs was helped by the fact that filenames
were short (most 12 characters with the directory name),
but the speedup over -exec is still rather impressive.
If anyone can come up with a scenario where -exec
is significantly faster than xargs, I'd be interested.
--
Tapani Tarvainen
Reply to: