[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: scripting lynx

On Wed, Aug 01, 2001 at 06:08:29PM +0200, Russell Coker wrote:
> I tried both of them with no difference.

interesing.  either should have worked.

> > or use the LWP modules to make yourself a web-bot.
> I may have to do that.  Thanks for the suggestions.

you may need to set the Referer: header in the HTTP request. some cgi
scripts check the referer...(yes, that's pointless and stupid, but it's
quite common).

and set the user agent to something like:

	$ua->agent('Mozilla/4.51 (Macintosh; I; PPC)');

i generally use netscape on mac as my user-agent in web robots because:

a) moronic sites generally don't block netscape on macintosh 
   (i have seen some sites that block netscape on linux with a stupid
   message like "sorry, we don't support your browser/operating-system".
   unfortunately, brain-dead web design is not yet a capital crime)

b) said moronic sites generally wont output moronic IE-specific junk
   if they detect netscape.  sometimes.  if you're lucky.

btw, the perl HTML::TokeParser module is excellent for extracting stuff
from web pages. i used this (plus LWP::UserAgent, HTTP::Cookies, and
HTTP::Request) to write a wrapper script for searching the Melbourne
Trading Post site, which is one of the most brain-dead cretinous sites
i've ever had the misfortune of having to use.

there's also HTML::TableExtract for getting data out of html tables.

these modules are all packaged for debian.


craig sanders <cas@taz.net.au>

Fabricati Diem, PVNC.
 -- motto of the Ankh-Morpork City Watch

Reply to: