Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)

To: gopher-project@other.debian.org
Subject: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
From: Sean Conner <sean@conman.org>
Date: Thu, 28 Nov 2019 18:06:44 -0500
Message-id: <[🔎] 20191128230644.GD30671@brevard.conman.org>
In-reply-to: <[🔎] 89cb4e7e-68ef-bbff-edc7-d41c0409aba8@leveck.us>
References: <[🔎] 16eb08176ef.11eef69dc482927.3075840238943420488@kiwidev.xyz> <[🔎] 20191128103133.GB30671@brevard.conman.org> <[🔎] 20191128182223.0117879b@aluminium.mobile.teply.info> <[🔎] 20191128174525.11D43101923F3@r-36.net> <[🔎] 20191128223304.GC30671@brevard.conman.org> <[🔎] 89cb4e7e-68ef-bbff-edc7-d41c0409aba8@leveck.us>

It was thus said that the Great cimejes once stated:
> >
> >   But if I wanted to block robots from crawling the black hole I created,
> >would the following actuallyu work?
> >
> >		User-agent: *
> >		Disallow: BlackHole:
> 
> I used this:
> 
> User-agent: *
> Disallow: /
> User-agent: veronica
> Allow: /
> User-agent: eomyidae/0.3
> Allow: /

  If I wanted to block all bots from my gophersite, this might not actually
work.  The only reason this works for HTTP is that *all* requests to a
webserver MUST start with a '/' (technically, the resource requested MUST
start with a '/').  This makes checking a request against the Disallow:
header easy---it's just a partial match from the start.  Given

	Disallow: /

every request will partically match that due to how resources are requested
via HTTP.

  This doesn't work for gopher.  Were I to add that, the *only* selectors
that would match are:

	/robots.txt
	/caps.txt

All other selectors on my site don't start with a '/', which is the problem
(a related problem is gopher clients assuming all selectors start with '/'). 
So again, if I wanted to block *all* selectors from bots, how do I do that?

  -spc (Or even a subset of my site, like "BlackHole:"?)

Reply to:

References:
- Download a whole gopherhole using wget/curl?
  - From: kiwidevelops <kiwidev@kiwidev.xyz>
- Re: Download a whole gopherhole using wget/curl?
  - From: Sean Conner <sean@conman.org>
- Re: Download a whole gopherhole using wget/curl?
  - From: Florian Teply <usenet@teply.info>
- Re: Download a whole gopherhole using wget/curl?
  - From: Christoph Lohmann <20h@r-36.net>
- robots.txt (was Re: Download a whole gopherhole using wget/curl?)
  - From: Sean Conner <sean@conman.org>
- Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
  - From: cimejes <leveck@leveck.us>

Prev by Date: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
Next by Date: Re: Download a whole gopherhole using wget/curl?
Previous by thread: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
Next by thread: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
Index(es):
- Date
- Thread