[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Tool to show maximal repeating patterns / structure in (text?) data



On Sun, Jul 13, 2008 at 3:26 PM, Dave Sherohman <dave@sherohman.org> wrote:
> You're on the right track here, at least for getting as far as detecting
> maximal-length identical strings.  As I recall, Huffman encoding should
> be what you're looking for.
>
> Another place to look would be search indexing algorithms.  I used to
> know a guy who'd done graduate work in that area and, from talking to
> him about it, it sounded like this is one of their key techniques.

Dave,

I've briefly scanned wikipedia's pages on Huffman coding, DEFLATE,
LZ77 and LZ78, LZW, etc, and that's definitely what I'm looking for.
Wikipedia's entry on "Dictionary coder" even contains interesting
example algorithms. I can sense a fascinating project coming on...

Thank you for the pointers, Jaime


Reply to: