[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: check web links



Hi,
>>"Brandon" == Brandon Mitchell <bhmit1@mail.wm.edu> writes:

Brandon> Just found it: http://www.ics.uci.edu/pub/websoft/MOMspider/

	That's the old perl 4 version.

Brandon> What additions did you make?

	Hmm. I forst started by giving it variable levels of verbosity
 (on a large site, MOMspider tended to be too chatty). Also, I gave it
 multiple levels of debuggability, and cleaned up the code to
 implement that. That was when the 5.000 versions started coming out. 

	MOMspider is a very object oriented design, except that you
 could not have lists of lists very easily back then, so you have to
 go through contortions to achive the object nature that comes so
 easily with perl5.

	I changed all datastructures to lists of lists, and speeded up
 my mom spider runs from 5 days to 8 hours. Also, I used LWP to make
 the thing far smaller.

	That's all I remember at the moment. (It's been a
 while). There were numerous other little fixes that came out of that
 ... Oh, you could also run weblint/nsgmls/validation on the links, on
 a per tree basis, and mail the results back (I used to be a martinet
 when it came to validating HTML).

	manoj
 who hates diskeating NT
-- 
 Hey, I had to let awk be better at *something*...  :-) --Larry Wall
 in <1991Nov7.200504.25280@netlabs.com>1
Manoj Srivastava               <url:mailto:srivasta@acm.org>
Mobile, Alabama USA            <url:http://www.datasync.com/%7Esrivasta/>


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-user-request@lists.debian.org . 
Trouble?  e-mail to templin@bucknell.edu .


Reply to: