[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Stopping webcrawlers.



Greetings all

I am developing a list of broken webcrawlers who are repeatedly 
downloading my entire web site including the hidden stuff.

These crawlers/bots are ignoring my robots.txt files and aren't just 
indexing the site, but are downloading every single bit of every file 
there.

This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5 
bots all go into this pull it all mode at the same time. 

How do I best deal with these poorly written bots? I can target the 
individual address of course, but have chosen to block the /24, but that 
seems not to bother them for more than 30 minutes. Its also a too broad 
brush, blocking legit addresses access. Restarting apache2 also work, 
for half an hour or so, but I may be interrupting a legit request for a 
realtime kernel whose built tree is around 2.7GB in tgz format  

How do I get their attention to stop the DDOS?  Or is this a war you 
cannot win?

Thanks all.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: