Why don't you just update your robots.txt to explicitly specify which
files you don't or do, allow spiders access to. If it's a rule-obiding
spider, that will be the end of it.
On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
> I have a nasty web spider with an agent name of "LinkWalker" downloading
> everything on my site (including .tgz files). Does anyone know anything
> about it?
> I've added the following to my firewall setup to stop further attacks...
> # crappy LinkWalker - evil spider that downloads every file including .tgz on
> # the site
> iptables -A INPUT -j logitrej -p tcp -s 18.104.22.168 -d 0.0.0.0/0 --dport www
> http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
> http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
> http://www.coker.com.au/projects.html Projects I am working on
> http://www.coker.com.au/~russell/ My home page
> To UNSUBSCRIBE, email to email@example.com
> with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org
- From: Russell Coker <email@example.com>