On Sun, 23 Dec 2001 20:28, Nick Jennings wrote:
> Why don't you just update your robots.txt to explicitly specify which
> files you don't or do, allow spiders access to. If it's a rule-obiding
> spider, that will be the end of it.
I wasn't aware that there was any format to robots.txt, I thought that the
mere presense of such a file would prevent robots from visiting.
As for rule-abiding spiders, such programs will not download files ending in
.wav, .mp3, .gz, .tgz, or .zip so I won't even see them.
That's why I usually don't even notice responsible web spiders such as google
when browsing my web logs!
> On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
> > I have a nasty web spider with an agent name of "LinkWalker" downloading
> > everything on my site (including .tgz files). Does anyone know anything
> > about it?
> > I've added the following to my firewall setup to stop further attacks...
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page