Re: Stopping webcrawlers.

To: debian-user@lists.debian.org
Subject: Re: Stopping webcrawlers.
From: john doe <johndoe65534@mail.com>
Date: Sun, 3 Nov 2019 16:34:09 +0100
Message-id: <[🔎] 38f07c91-a569-ffca-f025-28e9ba8b7318@mail.com>
In-reply-to: <[🔎] 201911031004.46626.gheskett@shentel.net>
References: <[🔎] 201911031004.46626.gheskett@shentel.net>

On 11/3/2019 4:04 PM, Gene Heskett wrote:
> Greetings all
>
> I am developing a list of broken webcrawlers who are repeatedly
> downloading my entire web site including the hidden stuff.
>
> These crawlers/bots are ignoring my robots.txt files and aren't just
> indexing the site, but are downloading every single bit of every file
> there.
>
> This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5
> bots all go into this pull it all mode at the same time.
>
> How do I best deal with these poorly written bots? I can target the
> individual address of course, but have chosen to block the /24, but that
> seems not to bother them for more than 30 minutes. Its also a too broad
> brush, blocking legit addresses access. Restarting apache2 also work,
> for half an hour or so, but I may be interrupting a legit request for a
> realtime kernel whose built tree is around 2.7GB in tgz format
>
> How do I get their attention to stop the DDOS?  Or is this a war you
> cannot win?
>

'fail2ban' for the bots that does not respect robot.txt.

--
John Doe

Reply to:

Follow-Ups:
- Re: Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>

References:
- Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>

Prev by Date: Re: Stopping webcrawlers.
Next by Date: Re: Stopping webcrawlers.
Previous by thread: Re: Stopping webcrawlers.
Next by thread: Re: Stopping webcrawlers.
Index(es):
- Date
- Thread