Re: Bug#656142: ITP: duff -- Duplicate file finder

To: debian-devel@lists.debian.org
Subject: Re: Bug#656142: ITP: duff -- Duplicate file finder
From: Samuel Thibault <sthibault@debian.org>
Date: Tue, 17 Jan 2012 12:03:41 +0100
Message-id: <[🔎] 20120117110341.GL4320@type.bordeaux.inria.fr>
Mail-followup-to: debian-devel@lists.debian.org
In-reply-to: <[🔎] 20120117104520.GA29095@havelock.liw.fi>
References: <[🔎] 20120116205813.24274.12515.reportbug@localhost6.localdomain6> <[🔎] 20120117091258.GA20971@havelock.liw.fi> <[🔎] 20120117093020.GA4320@type.bordeaux.inria.fr> <[🔎] 20120117104520.GA29095@havelock.liw.fi>

Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +0000, a écrit :
> > > Personally, I would be wary of using checksums for file comparisons,
> > > since comparing files byte-by-byte isn't slow (you only need to
> > > do it to files that are identical in size, and you need to read
> > > all the files anyway).
> > 
> > In some cases you may have a lot of files with identical size, so at
> > least a simple SSE-prone thing like crc is useful.
> 
> That's a good point. However, the pathological case would need to
> be quite pathological, since you can check around a thousand files
> of the same time at the same time (i.e., the number of open files
> per process), which is fairly rare for most people. But not all
> people, of course.

I'm not sure to understand what you mean exactly. If you have even
just a hundred files of the same size, you will need ten thousand file
comparisons! Using a hash reduces that to indexing the hundred file
hashes.

Samuel

Reply to:

Follow-Ups:
- Re: Bug#656142: ITP: duff -- Duplicate file finder
  - From: Roland Mas <lolando@debian.org>

References:
- Bug#656142: ITP: duff -- Duplicate file finder
  - From: Kamal Mostafa <kamal@whence.com>
- Re: Bug#656142: ITP: duff -- Duplicate file finder
  - From: Lars Wirzenius <liw@liw.fi>
- Re: Bug#656142: ITP: duff -- Duplicate file finder
  - From: Samuel Thibault <sthibault@debian.org>
- Re: Bug#656142: ITP: duff -- Duplicate file finder
  - From: Lars Wirzenius <liw@liw.fi>

Prev by Date: Re: Bug#656142: ITP: duff -- Duplicate file finder
Next by Date: Bug#656191: ITP: rtmidi -- C++ library for realtime MIDI input/ouput
Previous by thread: Re: Bug#656142: ITP: duff -- Duplicate file finder
Next by thread: Re: Bug#656142: ITP: duff -- Duplicate file finder
Index(es):
- Date
- Thread