[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stopping webcrawlers.



On 11/3/2019 5:32 PM, Gene Heskett wrote:
> On Sunday 03 November 2019 10:34:09 john doe wrote:
>
>> On 11/3/2019 4:04 PM, Gene Heskett wrote:
>>> Greetings all
>>>
>>> I am developing a list of broken webcrawlers who are repeatedly
>>> downloading my entire web site including the hidden stuff.
>>>
>>> These crawlers/bots are ignoring my robots.txt files and aren't just
>>> indexing the site, but are downloading every single bit of every
>>> file there.
>>>
>>> This is burning up my upload bandwidth and constitutes a DDOS when 4
>>> or 5 bots all go into this pull it all mode at the same time.
>>>
>>> How do I best deal with these poorly written bots? I can target the
>>> individual address of course, but have chosen to block the /24, but
>>> that seems not to bother them for more than 30 minutes. Its also a
>>> too broad brush, blocking legit addresses access. Restarting apache2
>>> also work, for half an hour or so, but I may be interrupting a legit
>>> request for a realtime kernel whose built tree is around 2.7GB in
>>> tgz format
>>>
>>> How do I get their attention to stop the DDOS?  Or is this a war you
>>> cannot win?
>>
>> 'fail2ban' for the bots that does not respect robot.txt.
>>
> Wasn't installed by this stretch version.  Is now, reading man page's.
> Frankly this looks dangerous when attempted to be run as beginning users.
> There ought to be a startup tutorial based on setting up the logging,
> then specifying who you want blocked from reading the logs.  Is there a
> formal tut of setting this up someplace?
>

Those are more hints then an howto:

https://askubuntu.com/questions/1116001/block-badbot-with-fail2ban-via-user-agents-in-access-log
https://www.booleanworld.com/blocking-bad-bots-fail2ban/

Or with Iptables:
https://blog.nintechnet.com/how-to-block-w00tw00t-at-isc-sans-dfind-and-other-web-vulnerability-scanners/
https://javapipe.com/blog/iptables-ddos-protection/


I guess I would impliment both approaches.


Does your website realy need to be available to the world?
Can't you consider an VPS with anti-DDoS capability?


HTH.

--
John Doe


Reply to: