Stopping webcrawlers.

To: debian-user@lists.debian.org
Subject: Stopping webcrawlers.
From: Gene Heskett <gheskett@shentel.net>
Date: Sun, 3 Nov 2019 10:04:46 -0500
Message-id: <[🔎] 201911031004.46626.gheskett@shentel.net>

Greetings all

I am developing a list of broken webcrawlers who are repeatedly 
downloading my entire web site including the hidden stuff.

These crawlers/bots are ignoring my robots.txt files and aren't just 
indexing the site, but are downloading every single bit of every file 
there.

This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5 
bots all go into this pull it all mode at the same time. 

How do I best deal with these poorly written bots? I can target the 
individual address of course, but have chosen to block the /24, but that 
seems not to bother them for more than 30 minutes. Its also a too broad 
brush, blocking legit addresses access. Restarting apache2 also work, 
for half an hour or so, but I may be interrupting a legit request for a 
realtime kernel whose built tree is around 2.7GB in tgz format  

How do I get their attention to stop the DDOS?  Or is this a war you 
cannot win?

Thanks all.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>

Reply to:

Follow-Ups:
- Re: Stopping webcrawlers.
  - From: Reco <recoverym4n@enotuniq.net>
- Re: Stopping webcrawlers.
  - From: John Hasler <jhasler@newsguy.com>
- Re: Stopping webcrawlers.
  - From: john doe <johndoe65534@mail.com>
- Re: Stopping webcrawlers.
  - From: Andy Smith <andy@strugglers.net>

Prev by Date: Re: KDE madness again
Next by Date: Re: Stopping webcrawlers.
Previous by thread: Re: KDE madness again
Next by thread: Re: Stopping webcrawlers.
Index(es):
- Date
- Thread