Bug#520401: ITP: simhash -- generate similarity hashes to find nearly duplicate files
Package: wnpp
Severity: wishlist
Owner: Thomas Koch <thomas@koch.ro>
* Package name : simhash
Version : only GIT, no releases
Upstream Author : Bart Massey
* URL : http://wiki.cs.pdx.edu/forge/simhash.html
* License : BSD
Programming Lang: C
Description : generate similarity hashes to find nearly duplicate files
One of the questions that it's nice to be able to answer about a pair of files
is the degree of similarity between them. This command-line tool is useful for
estimating the "degree of similarity" between a pair of nominally sequential
files such as textfiles. The tool uses Manassas's "shingleprinting" technique;
Reply to: