[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)



It was thus said that the Great cimejes once stated:
> >
> >   But if I wanted to block robots from crawling the black hole I created,
> >would the following actuallyu work?
> >
> >		User-agent: *
> >		Disallow: BlackHole:
> 
> I used this:
> 
> User-agent: *
> Disallow: /
> User-agent: veronica
> Allow: /
> User-agent: eomyidae/0.3
> Allow: /

  If I wanted to block all bots from my gophersite, this might not actually
work.  The only reason this works for HTTP is that *all* requests to a
webserver MUST start with a '/' (technically, the resource requested MUST
start with a '/').  This makes checking a request against the Disallow:
header easy---it's just a partial match from the start.  Given

	Disallow: /

every request will partically match that due to how resources are requested
via HTTP.

  This doesn't work for gopher.  Were I to add that, the *only* selectors
that would match are:

	/robots.txt
	/caps.txt

All other selectors on my site don't start with a '/', which is the problem
(a related problem is gopher clients assuming all selectors start with '/'). 
So again, if I wanted to block *all* selectors from bots, how do I do that?

  -spc (Or even a subset of my site, like "BlackHole:"?)


Reply to: