[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stopping webcrawlers.



On Sunday 03 November 2019 10:23:50 Reco wrote:

> On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote:
> > Greetings all
> >
> > I am developing a list of broken webcrawlers who are repeatedly
> > downloading my entire web site including the hidden stuff.
> >
> > These crawlers/bots are ignoring my robots.txt
>
> $ wget -O - https://www.shentel.com/robots.txt
> --2019-11-03 15:22:35--  https://www.shentel.com/robots.txt
> Resolving www.shentel.com (www.shentel.com)... 45.60.160.21
> Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443... 
> connected. HTTP request sent, awaiting response... 403 Forbidden
> 2019-11-03 15:22:36 ERROR 403: Forbidden.
>
> Allowing said bots to *see* your robots.txt would be a step into the
> right direction.
>
> Reco
But you are asking for shentel.com/robots.txt which is my isp.
You should be asking for

http://geneslinuxbox.net:6309/gene/robots.txt

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: