[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: wget asp pages recursively?



Matthew Weier O'Phinney helpfully answered my question about how one could download the pages in an asp application with:

>   wget -m -L -t 5 -w 5 http://someplace.com/some.asp&page=1

This does indeed have the desired effect. Unfortunately, it yields an embarrassment of riches - I seem to be getting all the pages on the site, which is quite large. While I could just download the whole site, I'm only on a 56k modem. I actually may not have enough disk space on my box to store all the files I might download.

Is there some way I can limit the URL's wget will follow? Suppose all of my pages have a URL like:

http://someplace.com/some.asp&ThisIsFixed=1&page=1

I would like to only download URLs that have "ThisIsFixed=1" in them.

What would be really cool is if one could get wget to test a URL with a regular expression before downloading it, but I don't see a way to do that. (I have been studying the man page.)

I could hack on wget's source code if necessary. If I have to do that, maybe somebody could give me a tip on where to look in the source. Maybe I could contribute a useful patch.

Thanks,

Mike
--
Michael D. Crawford
GoingWare Inc. - Expert Software Development and Consulting
http://www.goingware.com/
crawford@goingware.com

     Tilting at Windmills for a Better Tomorrow.



Reply to: