Tool to show maximal repeating patterns / structure in (text?) data

To: debian-user@lists.debian.org
Subject: Tool to show maximal repeating patterns / structure in (text?) data
From: "j t" <mark473@gmail.com>
Date: Sun, 13 Jul 2008 10:05:23 +0100
Message-id: <[🔎] b88f52540807130205h6e89bfeam714aac9d8b00e465@mail.gmail.com>

Hi all,

Does anyone know of a tool which will analyse a block of data and find
structure / repeating patterns in it, and then somehow show that
structure to the user?

As an example, pretend I give it the following paragraph of text (but
I don't tell it that the following paragraph contains a string
repeated 4 times):

<snip>
Support for Debian users who Support for Debian users who Support for
Debian users who Support for Debian users who
</snip>

I'd like this tool to tell me that the previous paragraph contains the
string "Support for Debian users who " 4 times (and I'd like the tool
to have worked that out on its own).

I realize that this example is trivial. I'd also like this tool to do
things which are more complicated, but since I can't find anything
that even helps me with my previous example, that will do for the time
being.

To preemptively answer the question "why do you want it / what is it
you're trying to achieve", I have a log of a dhcp conversation which
contains what I think is a repeated DHCPDISCOVER stanza. Rather than
the manual copy/paste/diff cycle, I'd like this tool to look at the
log and tell me: "Yup, you've got a stanza/paragraph repeated 4
times".

I might be butting up against the edge of what's theoretically
possible ("computer science"-wise) but I think that my requirements
have something to do with lossless compression algorithms. Perhaps I
should start reading the source code for gzip/bzip2...?

Thanks for your help, Jaime :-)

Reply to:

Follow-Ups:
- Re: Tool to show maximal repeating patterns / structure in (text?) data
  - From: "Javier Barroso" <javibarroso@gmail.com>
- Re: Tool to show maximal repeating patterns / structure in (text?) data
  - From: Dave Sherohman <dave@sherohman.org>

Prev by Date: Re: Applying correct hdparm values after resuming from suspend
Next by Date: Re: Cannot mkdir nested directories
Previous by thread: mono apache2 error "File exists: Failed to create shared memory segment for backend 'XXGLOBAL'"
Next by thread: Re: Tool to show maximal repeating patterns / structure in (text?) data
Index(es):
- Date
- Thread