[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Delete 4 million files



On Wed, Mar 25, 2009 at 07:53:06AM -0500, Ron Johnson wrote:
> On 2009-03-25 05:16, Tzafrir Cohen wrote:
>> On Wed, Mar 25, 2009 at 10:19:37AM +0100, Rainer Kluge wrote:
>>> Tapani Tarvainen schrieb:
>>>>> kj wrote:
>>>>>> Now, I've been running the usual find . -type f -exec rm {} \;
>>>>>> but this is going at about 700,000 per day.  Would simply doing 
>>>>>> an rm  -rf on the Maildir be quicker?  Or is there a better 
>>>>>> way?
>>>> While rm -rf would certainly be quicker and is obviously preferred
>>>> when you want to remove everything in the directory, the find version
>>>> could be speeded significantly by using xargs, i.e.,
>>>> find . -type f -print0 | xargs -0 rm
>>>> This is especially  useful if you want to remove files selecticely
>>>> instead of everything at once.
>>>>
>>> This exact solution has been proposed just one week ago in this same thread by
>>> Rob Starling
>>
>> And this requires traversing the directory not just a single time but 4
>> milion times (once per rm process). Not to mention merely spawning 4
>> milion such processes is not fun (but spawning those would probably fit
>> nicely within 10 minutes or so)
>>
>
> But isn't that (preventing the spawning of 4M processes) the reason why 
> xargs was created?

$ for i in `seq 4000000`;  echo hi; done | xargs /bin/echo | wc
    252 4000000 12000000

$ for i in `seq 4000000`; do echo something quite longer; done | xargs
/bin/echo | wc
    756 12000000 92000000

(Both took noticable time, though less than a minute)

So it's indeed not 4M processes, but still quite a few. But wrost:
you're traversing the directory many times. And you're telling rm in
which explicit order to remove files, rather than simply the native
order of the files in the directory (or whatever is convinient for the
implementor). Which probably requires rm a number of extra lookups in
the directory.

At this point I wanted to write "well, one thing you you can optimize
away is the sorting of the 4M files that find does. To disable it, use -".
But I could not find such an option. What I did find is:

  find /that/dir -type f -delete

But we know GNU find is a bloatware :-)

-- 
Tzafrir Cohen         | tzafrir@jabber.org | VIM is
http://tzafrir.org.il |                    | a Mutt's
tzafrir@cohens.org.il |                    |  best
ICQ# 16849754         |                    | friend


Reply to: