Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
The idea is that the owners of the crawlers read the robots.txt and respect the directives contained within.
Http robots.txt works the same way. The onus is on the crawler code.
On 29 Nov 2019 12:42 pm, James Mills <prologic@shortcircuit.net.au> wrote:
Silly question; But isn't the User-Agent kind of useless here since a Gopher request is basically just a selector for a resource?
There are no headers
No User-Agent to identify a request
What am I missing here :)
Kind Regards
James
James Mills / prologic
E: prologic@shortcircuit.net.au<mailto:prologic@shortcircuit.net.au>
W: prologic.shortcircuit.net.au<http://prologic.shortcircuit.net.au>
On Fri, Nov 29, 2019 at 3:39 PM Sean Conner <sean@conman.org<mailto:sean@conman.org>> wrote:
It was thus said that the Great Christoph Lohmann once stated:
> Good point. In eomyidae you have two possibilities:
>
> User-Agent: *
> Disallow: *
Okay, but this diverts from the HTTP version of robots.txt (from my
understanding unless it's been updated since I was last dealing with this
stuff).
> and
>
> User-Agent: *
> Disallow:
This actually has a different meaning from the HTTP version---there this
means "all browsers allowed to crawl" (back from when it robots.txt was
first developed).
-spc
Reply to: