Re: webcrawl to cache dynamic pages
> On Mon, May 02, 2005 at 01:27:41PM +0100, Richard Lyons wrote:
> > I am considering how to crawl a site which is dynamically generated,
> > and create a static version of all generated pages (or selected
> > generated pages). I guess it would be simplest to start with an
> > existing crawler, and bolt on some code. Or, alternatively, write a
> > script (perl, I fear) to modify the cache built by a crawler.
> > The idea is to allow a static ecommerce site to be generated from any
> > database-generated shopping cart system.
> > Any advice where to begin?
Well, I don't know an "elegant" solution... one dirty approach would be to
first download the site with "wget -r", then you would get lots of files with
names like this:
So it would be quite easy to write a simple perl script that substitutes the
special characters for others more "static-like", and you would get something
Also, surely you should have to parse the content of each file to substitute
the links inside them.
Maybe too complicated?
No book comes out of a vacuum (G. Buehler)