[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)



On 11/28/19 3:33 PM, Sean Conner wrote:
It was thus said that the Great Christoph Lohmann once stated:
Greetings.

For the crawling I already proposed for eomyidae and what if
follows some robots.txt for gopherspace.

See gopher://gopherproject.org/1/eomyidae for the details. All
automatic crawlers or archivers should keep to this standard.

   I have a question about this, and it releates to the section I added last
night to my gopherhole.  I read the document given above, and in there, I
read:

	Now put into this file:
User-agent: eomyidae/0.3
         	Disallow: /
Or to disallow all crawlers: User-agent: *
         	Disallow: /

   That follows directly from the standard for HTTP, but gopher isn't HTTP.
I'm asking because very few selectors in my gopherhole start with a '/' [1],
so this doesn't really work for me if I wanted to block all crawlers from my
site (which I don't do [2]).

   But if I wanted to block robots from crawling the black hole I created,
would the following actuallyu work?

		User-agent: *
		Disallow: BlackHole:

   -spc (Who doesn't have a conventional gopherhole ... )

[1]	The only two are:

		/robots.txt
		/caps.txt

[2]	My robots.txt file:

		User-agent: *
		Disallow:


I used this:

User-agent: *
Disallow: /
User-agent: veronica
Allow: /
User-agent: eomyidae/0.3
Allow: /

--
Nathaniel Leveck
gopher://1436.ninja
https://leveck.us


Reply to: