[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Stopping webcrawlers.



On Sun, Nov 03, 2019 at 10:48:58AM -0500, Gene Heskett wrote:
> On Sunday 03 November 2019 10:23:50 Reco wrote:
> 
> > On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote:
> > > Greetings all
> > >
> > > I am developing a list of broken webcrawlers who are repeatedly
> > > downloading my entire web site including the hidden stuff.
> > >
> > > These crawlers/bots are ignoring my robots.txt
> >
> > $ wget -O - https://www.shentel.com/robots.txt
> > --2019-11-03 15:22:35--  https://www.shentel.com/robots.txt
> > Resolving www.shentel.com (www.shentel.com)... 45.60.160.21
> > Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443... 
> > connected. HTTP request sent, awaiting response... 403 Forbidden
> > 2019-11-03 15:22:36 ERROR 403: Forbidden.
> >
> > Allowing said bots to *see* your robots.txt would be a step into the
> > right direction.
> >
> But you are asking for shentel.com/robots.txt which is my isp.
> You should be asking for
> 
> http://geneslinuxbox.net:6309/gene/robots.txt

Wow. You sir owe me a new set of eyes.
I advise you to compare your monstrosity to this (a hint - it does
work) - [1].

Reco

[1] https://enotuniq.net/robots.txt


Reply to: