Re: LinkWalker
On 8 Jan 2002, at 9:56, Jesse Goerz wrote:
> On Tuesday 08 January 2002 01:38, Russell Coker wrote:
> > On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
> > > > I have a nasty web spider with an agent name of
> > > > "LinkWalker" downloading everything on my site
> > > > (including .tgz files). Does anyone know anything
> > > > about it?
> > >
> > > It's apparantly a link-validation robot operated by a
> > > company called SevenTwentyFour Incorporated, see:
> > > http://www.seventwentyfour.com/tech.html
> >
> > Oops.
> >
> > Actually they sent me an offer of a free trial to their
> > service (which seems quite useful). The free trial gave
> > me some useful stats and let me fix a bunch of broken
> > links (of course I didn't pay).
>
> You can do the same thing with wget:
> --spider
> When invoked with this option, Wget will behave as a Web
> spider, which means that it will not download the pages,
> just check that they are there. You can use it to check
> your bookmarks, e.g. with:
>
> wget --spider --force-html -i bookmarks.html
>
> This feature needs much more work for Wget to get close
> to the functionality of real WWW spiders.
>
> You'll be checking more than bookmarks but you get the idea.
>
In case you are running ht://dig, there's a add-on
on the contributed works page to parse htdig's output
and generate a broken links report from it.
Since htdig touches every link anyway, quite intimating.
Cheers,
Marcel
--
__
.´ `.
: :' ! Enjoy
`. `´ Debian/GNU Linux
`- Now even on the 5 Euro banknote!
Reply to: