Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
It was thus said that the Great cimejes once stated:
> >
> > But if I wanted to block robots from crawling the black hole I created,
> >would the following actuallyu work?
> >
> > User-agent: *
> > Disallow: BlackHole:
>
> I used this:
>
> User-agent: *
> Disallow: /
> User-agent: veronica
> Allow: /
> User-agent: eomyidae/0.3
> Allow: /
If I wanted to block all bots from my gophersite, this might not actually
work. The only reason this works for HTTP is that *all* requests to a
webserver MUST start with a '/' (technically, the resource requested MUST
start with a '/'). This makes checking a request against the Disallow:
header easy---it's just a partial match from the start. Given
Disallow: /
every request will partically match that due to how resources are requested
via HTTP.
This doesn't work for gopher. Were I to add that, the *only* selectors
that would match are:
/robots.txt
/caps.txt
All other selectors on my site don't start with a '/', which is the problem
(a related problem is gopher clients assuming all selectors start with '/').
So again, if I wanted to block *all* selectors from bots, how do I do that?
-spc (Or even a subset of my site, like "BlackHole:"?)
Reply to: