[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How can I find identical files in a directory

On Tue, 20 Apr 2004 21:03:15 +0200
Wolfgang Pfeiffer <roto@gmx.net> wrote:
> My goal is to get easily rid of identical files on a system:

I did something like this once for a whole filesystem with a bash
script.  md5sum'ing *everything* is wasteful of time and cpu cycles,
since (probably) most of the things you'll md5sum won't have duplicates.

Instead, what I did was to get an ls of all the directories in which
I wanted to search for duplicates (I used "find -type d -exec ls..."
since I was doing it over a filesystem).  I made sure the flags for
ls were such that I'd get a column with filesizes and a column with
pathnames.  And I had the output directed into a file.

Then, once that was done, I sorted the file (using the "sort" command)
using as sort key the column with filesizes, then used uniq (with
appropriate flags to only consider the filesize column) to trim out
lines for which no other file had the same size.  Then, I md5sum'd
all of those (output into a file), and used uniq on that file to find
duplicate md5sums.

That's a pretty brute-force way to do it, but it works.  I'm now
awaiting someone else to point out a much more elegant solution.


Chris Metzler			cmetzler@speakeasy.snip-me.net
		(remove "snip-me." to email)

"As a child I understood how to give; I have forgotten this grace since I
have become civilized." - Chief Luther Standing Bear

Attachment: pgpCeO_CdBRaq.pgp
Description: PGP signature

Reply to: