[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: batch browsing



Tony Godshall wrote:
> 
> [ksieben]
> > I woud use wget:
> > wget -r -k -H -l X -nc http://google-search-results
> > wher X is the level you like to (travers?) the links
> 
> Yeah, that works for the first go, but how to get the
> subsequent pages (just the ones that are interesting)?

I see my first answer are not working, its works fine to get from a
starting page the links, 
but google (better 7metasearch.com) didnt have links to the interesting
pages,
they link to a script with redirect to the page, that meens wget wont
work. (You noticed)
Ok I woudent answer if I didnt have a nother solution. But it is quit
some work.
Best: make a directory I called it bike.
Type your searchwords into the page mybee: >bike + office<
safe the result into the fresh created dir: result.asp
and now take my script (atchement)  witch automaed the following steps:
extracting the (interesting) urls out of the file and get them witch
wget.
The wget param are a little differnd: wget -r -k -l X -nc (didnt need
the -H)

first I try to make it more confi:
 if you examin the script you see that I was first up to get the search
words in the script
and ask with netcat for the page/parse them/and them get the interesting
ones witch wget.
but didnt get it to work.

a feautre:  :)
at some circumenses the google answer has an other page layout then ther
are no >[< and >]< 
around the url, this meens that the regularExpresion coudnt find the
Url, and wget get nothing to do.

Actualy it seams that wwwoffle works fine .... 

how you get the bike back? :)


-- 
ingo dross infomation/security architecture
[]___¸
######-\
O_-_-_O-\

Attachment: gosearch.sh
Description: Bourne shell script


Reply to: