[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: wget asp pages recursively?



On Sat, Aug 31, 2002 at 08:36:11AM -0400, Michael D. Crawford wrote:
> Matthew Weier O'Phinney helpfully answered my question about how one could 
> download the pages in an asp application with:
> 
> >   wget -m -L -t 5 -w 5 http://someplace.com/some.asp&page=1
> 
> This does indeed have the desired effect.  Unfortunately, it yields an 
> embarrassment of riches - I seem to be getting all the pages on the site, 
> which is quite large.  
> 
> Is there some way I can limit the URL's wget will follow?  
> 
> I would like to only download URLs that have "ThisIsFixed=1" in them.
> 
So, in reading the wget man-page, it says that -m is equivalent to -r -N
-l inf -nr. If you were to set -l to 1 (-l 1) this would cause it not to
look any further down (this is the level of recursion). Thus, if the
only relative links on _this_level_ of the site are to pages named like
this, you'll be safe... of course, there are probably more... see
comments below...

> What would be really cool is if one could get wget to test a URL with a 
> regular expression before downloading it, but I don't see a way to do that. 
> (I have been studying the man page.)
Yes, that would be cool. But that's what other languages are for... You
could always use perl to parse the page(s) for the link(s) you're
looking for, and then have it run wget on each of them. Better yet, use
the LWP modules...

--Matthew



Reply to: