[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stopping webcrawlers.



On Sunday 03 November 2019 12:11:16 john doe wrote:

> On 11/3/2019 5:32 PM, Gene Heskett wrote:
> > On Sunday 03 November 2019 10:34:09 john doe wrote:
> >> On 11/3/2019 4:04 PM, Gene Heskett wrote:
> >>> Greetings all
> >>>
> >>> I am developing a list of broken webcrawlers who are repeatedly
> >>> downloading my entire web site including the hidden stuff.
> >>>
> >>> These crawlers/bots are ignoring my robots.txt files and aren't
> >>> just indexing the site, but are downloading every single bit of
> >>> every file there.
> >>>
> >>> This is burning up my upload bandwidth and constitutes a DDOS when
> >>> 4 or 5 bots all go into this pull it all mode at the same time.
> >>>
> >>> How do I best deal with these poorly written bots? I can target
> >>> the individual address of course, but have chosen to block the
> >>> /24, but that seems not to bother them for more than 30 minutes.
> >>> Its also a too broad brush, blocking legit addresses access.
> >>> Restarting apache2 also work, for half an hour or so, but I may be
> >>> interrupting a legit request for a realtime kernel whose built
> >>> tree is around 2.7GB in tgz format
> >>>
> >>> How do I get their attention to stop the DDOS?  Or is this a war
> >>> you cannot win?
> >>
> >> 'fail2ban' for the bots that does not respect robot.txt.
> >
> > Wasn't installed by this stretch version.  Is now, reading man
> > page's. Frankly this looks dangerous when attempted to be run as
> > beginning users. There ought to be a startup tutorial based on
> > setting up the logging, then specifying who you want blocked from
> > reading the logs.  Is there a formal tut of setting this up
> > someplace?
>
> Those are more hints then an howto:
>
> https://askubuntu.com/questions/1116001/block-badbot-with-fail2ban-via
>-user-agents-in-access-log
> https://www.booleanworld.com/blocking-bad-bots-fail2ban/
>
> Or with Iptables:
> https://blog.nintechnet.com/how-to-block-w00tw00t-at-isc-sans-dfind-an
>d-other-web-vulnerability-scanners/
> https://javapipe.com/blog/iptables-ddos-protection/
>
>
> I guess I would impliment both approaches.
>
>
> Does your website realy need to be available to the world?
> Can't you consider an VPS with anti-DDoS capability?
>
Wouldn't have the foggiest how to set that up.  And/or setting up a 
login/password.  What I have there is of very little interest to folks 
not running an rpi3b or rpi4b, or a trs-80 Color Computer.

What advantage would the vps offer? And likely not have time to setup as 
I'm scheduled for a new aortic valve to be installed Tuesday. Mine is 85 
yo and about wore out. Pumping efficiency is about 30% due to leakage.
>
> HTH.
>
> --
> John Doe


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: