Stopping webcrawlers.
Greetings all
I am developing a list of broken webcrawlers who are repeatedly
downloading my entire web site including the hidden stuff.
These crawlers/bots are ignoring my robots.txt files and aren't just
indexing the site, but are downloading every single bit of every file
there.
This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5
bots all go into this pull it all mode at the same time.
How do I best deal with these poorly written bots? I can target the
individual address of course, but have chosen to block the /24, but that
seems not to bother them for more than 30 minutes. Its also a too broad
brush, blocking legit addresses access. Restarting apache2 also work,
for half an hour or so, but I may be interrupting a legit request for a
realtime kernel whose built tree is around 2.7GB in tgz format
How do I get their attention to stop the DDOS? Or is this a war you
cannot win?
Thanks all.
Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>
Reply to: