[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: wget asp pages recursively?



On Sat, Aug 31, 2002 at 08:36:11AM -0400, Michael D. Crawford wrote:
> Is there some way I can limit the URL's wget will follow?  Suppose all of my 
> pages have a URL like:
> 
> http://someplace.com/some.asp&ThisIsFixed=1&page=1
> 
> I would like to only download URLs that have "ThisIsFixed=1" in them.
> 
> What would be really cool is if one could get wget to test a URL with a regular 
> expression before downloading it, but I don't see a way to do that.  (I have 
> been studying the man page.)

The "-A ACCLIST" option might help, it accepts shell wildcards.

If you need regular expressions, have a look at pavuk.
This is a tool similar to wget and has lots of options.
If they still don't meed your needs, you can even plug in your own
scripts to decide for each URL seperately whether it should be
downloaded or used for recursion.
Also, it offers flexible renaming without breaking links (-fnrules
option). You can use the MIME type to set the file extension, although
this feature seems to be somewhat experimental.
I haven't tried the advanced options myself yet, I learned about them
from reading the manpage.

Cheers,
Mirko



Reply to: