Re: Stopping webcrawlers.

To: debian-user@lists.debian.org
Subject: Re: Stopping webcrawlers.
From: Gene Heskett <gheskett@shentel.net>
Date: Sun, 3 Nov 2019 11:32:38 -0500
Message-id: <[🔎] 201911031132.38865.gheskett@shentel.net>
In-reply-to: <[🔎] 38f07c91-a569-ffca-f025-28e9ba8b7318@mail.com>
References: <[🔎] 201911031004.46626.gheskett@shentel.net> <[🔎] 38f07c91-a569-ffca-f025-28e9ba8b7318@mail.com>

On Sunday 03 November 2019 10:34:09 john doe wrote:

> On 11/3/2019 4:04 PM, Gene Heskett wrote:
> > Greetings all
> >
> > I am developing a list of broken webcrawlers who are repeatedly
> > downloading my entire web site including the hidden stuff.
> >
> > These crawlers/bots are ignoring my robots.txt files and aren't just
> > indexing the site, but are downloading every single bit of every
> > file there.
> >
> > This is burning up my upload bandwidth and constitutes a DDOS when 4
> > or 5 bots all go into this pull it all mode at the same time.
> >
> > How do I best deal with these poorly written bots? I can target the
> > individual address of course, but have chosen to block the /24, but
> > that seems not to bother them for more than 30 minutes. Its also a
> > too broad brush, blocking legit addresses access. Restarting apache2
> > also work, for half an hour or so, but I may be interrupting a legit
> > request for a realtime kernel whose built tree is around 2.7GB in
> > tgz format
> >
> > How do I get their attention to stop the DDOS?  Or is this a war you
> > cannot win?
>
> 'fail2ban' for the bots that does not respect robot.txt.
>
Wasn't installed by this stretch version.  Is now, reading man page's.
Frankly this looks dangerous when attempted to be run as beginning users. 
There ought to be a startup tutorial based on setting up the logging, 
then specifying who you want blocked from reading the logs.  Is there a 
formal tut of setting this up someplace?

Thanks John Doe.

> --
> John Doe


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>

Reply to:

Follow-Ups:
- Re: Stopping webcrawlers.
  - From: john doe <johndoe65534@mail.com>

References:
- Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>
- Re: Stopping webcrawlers.
  - From: john doe <johndoe65534@mail.com>

Prev by Date: Re: Stopping webcrawlers.
Next by Date: Re: Stopping webcrawlers.
Previous by thread: Re: Stopping webcrawlers.
Next by thread: Re: Stopping webcrawlers.
Index(es):
- Date
- Thread