[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#656142: ITP: duff -- Duplicate file finder



Samuel Thibault, le Tue 17 Jan 2012 12:15:16 +0100, a écrit :
> Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +0000, a écrit :
> > On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote:
> > > Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +0000, a écrit :
> > > > real  user  system  max RSS  elapsed  cmd                                   
> > > >  (s)   (s)     (s)    (KiB)      (s)                                        
> > > >  3.2   2.4     5.8    62784      5.8  hardlink --dry-run files > /dev/null  
> > > >  1.1   0.4     1.6    15424      1.6  rdfind files > /dev/null              
> > > >  1.9   0.2     2.2     9904      2.2  duff-0.5/src/duff -r files > /dev/null
> > > 
> > > And fdupes on the same set of files?
> > 
> > real  user  system  max RSS  elapsed  cmd                                   
> >  (s)   (s)     (s)    (KiB)      (s)                                        
> >  3.1   2.4     5.5    62784      5.5  hardlink --dry-run files > /dev/null  
> >  1.1   0.4     1.6    15392      1.6  rdfind files > /dev/null              
> >  1.3   0.9     2.2    13936      2.2  fdupes -r -q files > /dev/null        
> >  1.9   0.2     2.1     9904      2.1  duff-0.5/src/duff -r files > /dev/null
> > 
> > Someone should run the benchmark on a large set of data, preferably
> > on various kinds of real data, rather than my small synthetic data set.
> 
> On my PhD work directory, with various stuff in it (500MiB, 18000 files,
> big but also small files (svn/git checkouts etc)), everything being in
> cache already (no disk I/O):
> 
> hardlink -t --dry-run . > /dev/null       1,06s user 0,46s system 99% cpu 1,538 total
> rdfind . > /dev/null                      0,68s user 0,19s system 99% cpu 0,877 total
> fdupes -q -r . > /dev/null 2> /dev/null   0,80s user 0,90s system 99% cpu 1,708 total
> ~/src/duff-0.5/src/duff -r . > /dev/null  1,53s user 0,08s system 99% cpu 1,610 total

And with nothing in cache, SSD hard drive:

hardlink -t --dry-run . > /dev/null       1,86s user 1,23s system 12% cpu 24,260 total
rdfind . > /dev/null                      1,18s user 1,31s system 8%  cpu 27,837 total
fdupes -q -r . > /dev/null 2> /dev/null   1,30s user 2,13s system 11% cpu 29,820 total
~/src/duff-0.5/src/duff -r . > /dev/null  1,88s user 0,47s system 16% cpu 13,949 total

(yes, user time is different, and measures are stable. Also note that
I have added -t to hardlink, otherwise it takes file timestamp into
account).

I guess duff gets a clear win because it does not systematically compute
the checksum of files with the same size, but first reads a few bytes,
for the big files.

samuel


Reply to: