Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)

To: James Mills <prologic@shortcircuit.net.au>
Cc: "gopher-project@other.debian.org" <gopher-project@other.debian.org>
Subject: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
From: Matt Owen <matt@jaruzel.com>
Date: Fri, 29 Nov 2019 16:37:52 +0000
Message-id: <[🔎] c8df9523-bc01-40a7-9350-dd9091b860ea@email.android.com>

The idea is that the owners of the crawlers read the robots.txt and respect the directives contained within.

Http robots.txt works the same way. The onus is on the crawler code.

On 29 Nov 2019 12:42 pm, James Mills <prologic@shortcircuit.net.au> wrote:
Silly question; But isn't the User-Agent kind of useless here since a Gopher request is basically just a selector for a resource?
There are no headers
No User-Agent to identify a request

What am I missing here :)

Kind Regards

James

James Mills / prologic

E: prologic@shortcircuit.net.au<mailto:prologic@shortcircuit.net.au>
W: prologic.shortcircuit.net.au<http://prologic.shortcircuit.net.au>

On Fri, Nov 29, 2019 at 3:39 PM Sean Conner <sean@conman.org<mailto:sean@conman.org>> wrote:
It was thus said that the Great Christoph Lohmann once stated:
> Good point. In eomyidae you have two possibilities:
>
>       User-Agent: *
>       Disallow: *

  Okay, but this diverts from the HTTP version of robots.txt (from my
understanding unless it's been updated since I was last dealing with this
stuff).

> and
>
>       User-Agent: *
>       Disallow:

  This actually has a different meaning from the HTTP version---there this
means "all browsers allowed to crawl" (back from when it robots.txt was
first developed).

  -spc

Reply to:

Prev by Date: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
Previous by thread: Re: Can anyone recommend gopher gateway software tha's nginx-friendly?
Index(es):
- Date
- Thread