Re: Stopping webcrawlers.

To: debian-user@lists.debian.org
Subject: Re: Stopping webcrawlers.
From: Reco <recoverym4n@enotuniq.net>
Date: Sun, 3 Nov 2019 19:56:52 +0300
Message-id: <[🔎] E1iRJBM-0000rX-Gq@enotuniq.net>
In-reply-to: <[🔎] 201911031048.58825.gheskett@shentel.net>
References: <[🔎] 201911031004.46626.gheskett@shentel.net> <[🔎] E1iRHjK-0000p1-86@enotuniq.net> <[🔎] 201911031048.58825.gheskett@shentel.net>

On Sun, Nov 03, 2019 at 10:48:58AM -0500, Gene Heskett wrote:
> On Sunday 03 November 2019 10:23:50 Reco wrote:
> 
> > On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote:
> > > Greetings all
> > >
> > > I am developing a list of broken webcrawlers who are repeatedly
> > > downloading my entire web site including the hidden stuff.
> > >
> > > These crawlers/bots are ignoring my robots.txt
> >
> > $ wget -O - https://www.shentel.com/robots.txt
> > --2019-11-03 15:22:35--  https://www.shentel.com/robots.txt
> > Resolving www.shentel.com (www.shentel.com)... 45.60.160.21
> > Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443... 
> > connected. HTTP request sent, awaiting response... 403 Forbidden
> > 2019-11-03 15:22:36 ERROR 403: Forbidden.
> >
> > Allowing said bots to *see* your robots.txt would be a step into the
> > right direction.
> >
> But you are asking for shentel.com/robots.txt which is my isp.
> You should be asking for
> 
> http://geneslinuxbox.net:6309/gene/robots.txt

Wow. You sir owe me a new set of eyes.
I advise you to compare your monstrosity to this (a hint - it does
work) - [1].

Reco

[1] https://enotuniq.net/robots.txt

Reply to:

Follow-Ups:
- Re: Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>

References:
- Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>
- Re: Stopping webcrawlers.
  - From: Reco <recoverym4n@enotuniq.net>
- Re: Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>

Prev by Date: Re: Stopping webcrawlers.
Next by Date: Re: Stopping webcrawlers.
Previous by thread: Re: Stopping webcrawlers.
Next by thread: Re: Stopping webcrawlers.
Index(es):
- Date
- Thread