[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LinkWalker



On 8 Jan 2002, at 9:56, Jesse Goerz wrote:

> On Tuesday 08 January 2002 01:38, Russell Coker wrote:
> > On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
> > > > I have a nasty web spider with an agent name of
> > > > "LinkWalker" downloading everything on my site
> > > > (including .tgz files).  Does anyone know anything
> > > > about it?
> > >
> > > It's apparantly a link-validation robot operated by a
> > > company called SevenTwentyFour Incorporated, see:
> > > http://www.seventwentyfour.com/tech.html
> >
> > Oops.
> >
> > Actually they sent me an offer of a free trial to their
> > service (which seems quite useful).  The free trial gave
> > me some useful stats and let me fix a bunch of broken
> > links (of course I didn't pay).
>
> You can do the same thing with wget:
> --spider
>    When invoked with this option, Wget will behave as a Web
>    spider, which means that it will not download the pages,
>    just check that they are there.  You can use it to check
>    your bookmarks, e.g. with:
>
>         wget --spider --force-html -i bookmarks.html
>
>    This feature needs much more work for Wget to get close
>    to the functionality of real WWW spiders.
>
> You'll be checking more than bookmarks but you get the idea.
>

In case you are running ht://dig, there's a add-on
on the contributed works page to parse htdig's output
and generate a broken links report from it.
Since htdig touches every link anyway, quite intimating.

Cheers,
Marcel


--
   __
 .´  `.
 : :' !   Enjoy
 `. `´   Debian/GNU Linux
   `-   Now even on the 5 Euro banknote!



Reply to: