[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Script to delete duplicate files



On Sat, May 20, 2006 at 09:42:25AM -0400, hendrik@topoi.pooq.com wrote:
> On Fri, May 19, 2006 at 07:10:41PM -0700, Curtis Vaughan wrote:
> > Ok, using fdupes -f I have created a file that contains a list of all 
> > duplicate files. So, what command can a run against that file to delete 
> > all the files listed in it?
> > 
> > Or since I know that fdupes -f works, could I just do something like:
> > 
> > fdupes -f ./ | rm *
> > 
> > or would that rm everything?
> > 
> > Thanks
> 
> OK.  Here's my paranoia.
> 
> If there are, say, 8 identical files, does the output of fdupes contain 
> 8 or 7 files?  If it lists all of them, you'll lose all of them.  If 7, 
> it will leave you one.
> 
> Also -- better be sure fdupes is not fooled by symbolic links into 
> thingking the original is a duplicate.  Or by one file system being 
> mounted at multiple locations.
> 

I think there are 8 reported, but you can generate 8 identical files
in a test environment, run fdupes and count the answers. With files
that pass the tests documented on the man page, there is no 'original'
file that is somehow logically different from the other (n-1)
'duplicates'.  You can't reliably use creation date, because that can
be identical, and because it can be modified. The man page mentions
problems with confusion of real files and soft links to real files.
Read it.

But the above post indicates to me that you are beginning to realize
that this is not merely a scripting problem.

Also, getting rid of duplicate files can break a system, even if
one good copy of each identity class is left behind. There are lots
of pieces of software that only work correctly if a particular file
is available in a particular place. You get rid of the copy that that
sofware knows about and you break the software. 

-- 
Paul E Condon           
pecondon@mesanetworks.net



Reply to: