[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#656142: ITP: duff -- Duplicate file finder



On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote:
> Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +0000, a écrit :
> > real  user  system  max RSS  elapsed  cmd                                   
> >  (s)   (s)     (s)    (KiB)      (s)                                        
> >  3.2   2.4     5.8    62784      5.8  hardlink --dry-run files > /dev/null  
> >  1.1   0.4     1.6    15424      1.6  rdfind files > /dev/null              
> >  1.9   0.2     2.2     9904      2.2  duff-0.5/src/duff -r files > /dev/null
> 
> And fdupes on the same set of files?

real  user  system  max RSS  elapsed  cmd                                   
 (s)   (s)     (s)    (KiB)      (s)                                        
 3.1   2.4     5.5    62784      5.5  hardlink --dry-run files > /dev/null  
 1.1   0.4     1.6    15392      1.6  rdfind files > /dev/null              
 1.3   0.9     2.2    13936      2.2  fdupes -r -q files > /dev/null        
 1.9   0.2     2.1     9904      2.1  duff-0.5/src/duff -r files > /dev/null

Someone should run the benchmark on a large set of data, preferably
on various kinds of real data, rather than my small synthetic data set.
(I have, alas, neither the time nor the hardware to do that.)

> > Personally, I would be wary of using checksums for file comparisons,
> > since comparing files byte-by-byte isn't slow (you only need to
> > do it to files that are identical in size, and you need to read
> > all the files anyway).
> 
> In some cases you may have a lot of files with identical size, so at
> least a simple SSE-prone thing like crc is useful.

That's a good point. However, the pathological case would need to
be quite pathological, since you can check around a thousand files
of the same time at the same time (i.e., the number of open files
per process), which is fairly rare for most people. But not all
people, of course.

-- 
Freedom-based blog/wiki/web hosting: http://www.branchable.com/

Attachment: signature.asc
Description: Digital signature


Reply to: