Re: How can I find identical files in a directory

To: debian-user@lists.debian.org
Subject: Re: How can I find identical files in a directory
From: Chris Metzler <cmetzler@speakeasy.net>
Date: Tue, 20 Apr 2004 15:39:25 -0400
Message-id: <[🔎] 20040420153925.50209a41.cmetzler@speakeasy.net>
In-reply-to: <[🔎] 1082487795.1436.43.camel@debby>
References: <[🔎] 1082487795.1436.43.camel@debby>

On Tue, 20 Apr 2004 21:03:15 +0200
Wolfgang Pfeiffer <roto@gmx.net> wrote:
>
> My goal is to get easily rid of identical files on a system:

I did something like this once for a whole filesystem with a bash
script.  md5sum'ing *everything* is wasteful of time and cpu cycles,
since (probably) most of the things you'll md5sum won't have duplicates.

Instead, what I did was to get an ls of all the directories in which
I wanted to search for duplicates (I used "find -type d -exec ls..."
since I was doing it over a filesystem).  I made sure the flags for
ls were such that I'd get a column with filesizes and a column with
pathnames.  And I had the output directed into a file.

Then, once that was done, I sorted the file (using the "sort" command)
using as sort key the column with filesizes, then used uniq (with
appropriate flags to only consider the filesize column) to trim out
lines for which no other file had the same size.  Then, I md5sum'd
all of those (output into a file), and used uniq on that file to find
duplicate md5sums.

That's a pretty brute-force way to do it, but it works.  I'm now
awaiting someone else to point out a much more elegant solution.
Heh.

-c

-- 
Chris Metzler			cmetzler@speakeasy.snip-me.net
		(remove "snip-me." to email)

"As a child I understood how to give; I have forgotten this grace since I
have become civilized." - Chief Luther Standing Bear

Attachment: pgp1TAlEzm4t4.pgp
Description: PGP signature

Reply to:

Follow-Ups:
- Re: How can I find identical files in a directory
  - From: "Karsten M. Self" <kmself@ix.netcom.com>

References:
- How can I find identical files in a directory
  - From: Wolfgang Pfeiffer <roto@gmx.net>

Prev by Date: Re: Why fonts available with X and not with KDE/GNOME ?
Next by Date: Re: To dselect or aptitude, that is the question
Previous by thread: Re: How can I find identical files in a directory
Next by thread: Re: How can I find identical files in a directory
Index(es):
- Date
- Thread