[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#520401: ITP: simhash -- generate similarity hashes to find nearly duplicate files



Package: wnpp
Severity: wishlist
Owner: Thomas Koch <thomas@koch.ro>


* Package name    : simhash
  Version         : only GIT, no releases
  Upstream Author : Bart Massey
* URL             : http://wiki.cs.pdx.edu/forge/simhash.html
* License         : BSD
  Programming Lang: C
  Description     : generate similarity hashes to find nearly duplicate files
 One of the questions that it's nice to be able to answer about a pair of files
 is the degree of similarity between them. This command-line tool is useful for
 estimating the "degree of similarity" between a pair of nominally sequential
 files such as textfiles. The tool uses Manassas's "shingleprinting" technique;



Reply to: