Re: Stopping webcrawlers.

To: debian-user@lists.debian.org
Subject: Re: Stopping webcrawlers.
From: Reco <recoverym4n@enotuniq.net>
Date: Sun, 3 Nov 2019 18:23:50 +0300
Message-id: <[🔎] E1iRHjK-0000p1-86@enotuniq.net>
In-reply-to: <[🔎] 201911031004.46626.gheskett@shentel.net>
References: <[🔎] 201911031004.46626.gheskett@shentel.net>

On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote:
> Greetings all
> 
> I am developing a list of broken webcrawlers who are repeatedly 
> downloading my entire web site including the hidden stuff.
> 
> These crawlers/bots are ignoring my robots.txt

$ wget -O - https://www.shentel.com/robots.txt
--2019-11-03 15:22:35--  https://www.shentel.com/robots.txt
Resolving www.shentel.com (www.shentel.com)... 45.60.160.21
Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443...  connected.
HTTP request sent, awaiting response... 403 Forbidden
2019-11-03 15:22:36 ERROR 403: Forbidden.

Allowing said bots to *see* your robots.txt would be a step into the
right direction.

Reco

Reply to:

Follow-Ups:
- Re: Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>

References:
- Stopping webcrawlers.
  - From: Gene Heskett <gheskett@shentel.net>

Prev by Date: Stopping webcrawlers.
Next by Date: Re: Stopping webcrawlers.
Previous by thread: Stopping webcrawlers.
Next by thread: Re: Stopping webcrawlers.
Index(es):
- Date
- Thread