Re: check web links
Hi,
>>"Brandon" == Brandon Mitchell <bhmit1@mail.wm.edu> writes:
Brandon> Just found it: http://www.ics.uci.edu/pub/websoft/MOMspider/
That's the old perl 4 version.
Brandon> What additions did you make?
Hmm. I forst started by giving it variable levels of verbosity
(on a large site, MOMspider tended to be too chatty). Also, I gave it
multiple levels of debuggability, and cleaned up the code to
implement that. That was when the 5.000 versions started coming out.
MOMspider is a very object oriented design, except that you
could not have lists of lists very easily back then, so you have to
go through contortions to achive the object nature that comes so
easily with perl5.
I changed all datastructures to lists of lists, and speeded up
my mom spider runs from 5 days to 8 hours. Also, I used LWP to make
the thing far smaller.
That's all I remember at the moment. (It's been a
while). There were numerous other little fixes that came out of that
... Oh, you could also run weblint/nsgmls/validation on the links, on
a per tree basis, and mail the results back (I used to be a martinet
when it came to validating HTML).
manoj
who hates diskeating NT
--
Hey, I had to let awk be better at *something*... :-) --Larry Wall
in <1991Nov7.200504.25280@netlabs.com>1
Manoj Srivastava <url:mailto:srivasta@acm.org>
Mobile, Alabama USA <url:http://www.datasync.com/%7Esrivasta/>
--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-user-request@lists.debian.org .
Trouble? e-mail to templin@bucknell.edu .
Reply to: