On Tue, Jan 17, 2012 at 09:12:58AM +0000, Lars Wirzenius wrote:
> rdfind seems to be quickest one, but duff compares well with hardlink,
> which (see http://liw.fi/dupfiles/) was the fastest one I knew of in
> Debian so far.

Does anyone know of a duplicate file finder that can keep its
database of seen files in an on-disk database instead of RAM? When
looking for duplicates in a tree of hundreds of millions of files
this can otherwise require quite a lot of RAM.

Perhaps it can be worked around using lots of swap, but I would have
thought this could lead to other processes getting swapped out, when
generally I would rather that the duplicate finder just got slower


