[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: finding similar files



Hendrik Boom:
>
> There wouldn't happen to be any handy tools for searching a directory 
> tree with a few hundred ASCII files and telling me which ones have 
> similar content?

I don't know about any tools ready for use, but if your files aren't too
large you could build something out of my (or any other) implementation
of metric space indexes:

http://well-adjusted.de/mspace.py

Depending on the number and the size of your files, this might be a
little slow, though. This implementation is faster, but has less
features:

http://code.activestate.com/recipes/572156/

You would have to index all your files and then iterate over all the
files again and search for similar files in the index. It is hard to
tell whether this approach would save time, but it is a good opportunity
to learn some Python, in case you don't already know it. :)

J.
-- 
I no longer believe my life will be long, happy, interesting or fulfilled
[Agree]   [Disagree]
                 <http://www.slowlydownward.com/NODATA/data_enter2.html>

Attachment: signature.asc
Description: Digital signature


Reply to: