[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Tool to show maximal repeating patterns / structure in (text?) data

Hi all,

Does anyone know of a tool which will analyse a block of data and find
structure / repeating patterns in it, and then somehow show that
structure to the user?

As an example, pretend I give it the following paragraph of text (but
I don't tell it that the following paragraph contains a string
repeated 4 times):

Support for Debian users who Support for Debian users who Support for
Debian users who Support for Debian users who

I'd like this tool to tell me that the previous paragraph contains the
string "Support for Debian users who " 4 times (and I'd like the tool
to have worked that out on its own).

I realize that this example is trivial. I'd also like this tool to do
things which are more complicated, but since I can't find anything
that even helps me with my previous example, that will do for the time

To preemptively answer the question "why do you want it / what is it
you're trying to achieve", I have a log of a dhcp conversation which
contains what I think is a repeated DHCPDISCOVER stanza. Rather than
the manual copy/paste/diff cycle, I'd like this tool to look at the
log and tell me: "Yup, you've got a stanza/paragraph repeated 4

I might be butting up against the edge of what's theoretically
possible ("computer science"-wise) but I think that my requirements
have something to do with lossless compression algorithms. Perhaps I
should start reading the source code for gzip/bzip2...?

Thanks for your help, Jaime :-)

Reply to: