[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#656142: ITP: duff -- Duplicate file finder



Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +0000, a écrit :
> > > Personally, I would be wary of using checksums for file comparisons,
> > > since comparing files byte-by-byte isn't slow (you only need to
> > > do it to files that are identical in size, and you need to read
> > > all the files anyway).
> > 
> > In some cases you may have a lot of files with identical size, so at
> > least a simple SSE-prone thing like crc is useful.
> 
> That's a good point. However, the pathological case would need to
> be quite pathological, since you can check around a thousand files
> of the same time at the same time (i.e., the number of open files
> per process), which is fairly rare for most people. But not all
> people, of course.

I'm not sure to understand what you mean exactly. If you have even
just a hundred files of the same size, you will need ten thousand file
comparisons! Using a hash reduces that to indexing the hundred file
hashes.

Samuel


Reply to: