[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stopping webcrawlers.



On 11/3/2019 6:36 PM, Gene Heskett wrote:
> On Sunday 03 November 2019 12:11:16 john doe wrote:
>
>> On 11/3/2019 5:32 PM, Gene Heskett wrote:
>>> On Sunday 03 November 2019 10:34:09 john doe wrote:
>>>> On 11/3/2019 4:04 PM, Gene Heskett wrote:
>>>>> Greetings all
>>>>>
>>>>> I am developing a list of broken webcrawlers who are repeatedly
>>>>> downloading my entire web site including the hidden stuff.
>>>>>
>>>>> These crawlers/bots are ignoring my robots.txt files and aren't
>>>>> just indexing the site, but are downloading every single bit of
>>>>> every file there.
>>>>>
>>>>> This is burning up my upload bandwidth and constitutes a DDOS when
>>>>> 4 or 5 bots all go into this pull it all mode at the same time.
>>>>>
>>>>> How do I best deal with these poorly written bots? I can target
>>>>> the individual address of course, but have chosen to block the
>>>>> /24, but that seems not to bother them for more than 30 minutes.
>>>>> Its also a too broad brush, blocking legit addresses access.
>>>>> Restarting apache2 also work, for half an hour or so, but I may be
>>>>> interrupting a legit request for a realtime kernel whose built
>>>>> tree is around 2.7GB in tgz format
>>>>>
>>>>> How do I get their attention to stop the DDOS?  Or is this a war
>>>>> you cannot win?
>>>>
>>>> 'fail2ban' for the bots that does not respect robot.txt.
>>>
>>> Wasn't installed by this stretch version.  Is now, reading man
>>> page's. Frankly this looks dangerous when attempted to be run as
>>> beginning users. There ought to be a startup tutorial based on
>>> setting up the logging, then specifying who you want blocked from
>>> reading the logs.  Is there a formal tut of setting this up
>>> someplace?
>>
>> Those are more hints then an howto:
>>
>> https://askubuntu.com/questions/1116001/block-badbot-with-fail2ban-via
>> -user-agents-in-access-log
>> https://www.booleanworld.com/blocking-bad-bots-fail2ban/
>>
>> Or with Iptables:
>> https://blog.nintechnet.com/how-to-block-w00tw00t-at-isc-sans-dfind-an
>> d-other-web-vulnerability-scanners/
>> https://javapipe.com/blog/iptables-ddos-protection/
>>
>>
>> I guess I would impliment both approaches.
>>
>>
>> Does your website realy need to be available to the world?
>> Can't you consider an VPS with anti-DDoS capability?
>>
> Wouldn't have the foggiest how to set that up.  And/or setting up a
> login/password.  What I have there is of very little interest to folks
> not running an rpi3b or rpi4b, or a trs-80 Color Computer.
>
> What advantage would the vps offer? And likely not have time to setup as
> I'm scheduled for a new aortic valve to be installed Tuesday. Mine is 85
> yo and about wore out. Pumping efficiency is about 30% due to leakage.

I was talking about a paid solution like (1).

If you need to have some files accessible to the world, can't you use
google drive or alike solution

Basically, you only care about publishing the file you want to share and
not about the rest! :)

1)  https://www.ovh.com/world/anti-ddos/


--
John Doe


Reply to: