[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stopping webcrawlers.



On 11/3/2019 4:04 PM, Gene Heskett wrote:
> Greetings all
>
> I am developing a list of broken webcrawlers who are repeatedly
> downloading my entire web site including the hidden stuff.
>
> These crawlers/bots are ignoring my robots.txt files and aren't just
> indexing the site, but are downloading every single bit of every file
> there.
>
> This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5
> bots all go into this pull it all mode at the same time.
>
> How do I best deal with these poorly written bots? I can target the
> individual address of course, but have chosen to block the /24, but that
> seems not to bother them for more than 30 minutes. Its also a too broad
> brush, blocking legit addresses access. Restarting apache2 also work,
> for half an hour or so, but I may be interrupting a legit request for a
> realtime kernel whose built tree is around 2.7GB in tgz format
>
> How do I get their attention to stop the DDOS?  Or is this a war you
> cannot win?
>

'fail2ban' for the bots that does not respect robot.txt.

--
John Doe


Reply to: