RE: please use robots.txt for your gopher apps

To: 'Cameron Kaiser' <spectre@floodgap.com>, "'gopher-project@other.debian.org'" <gopher-project@other.debian.org>
Subject: RE: please use robots.txt for your gopher apps
From: Matt Owen <matt@jaruzel.com>
Date: Thu, 23 May 2019 09:36:24 +0000
Message-id: <[🔎] 5FF4F0F58D4ACF4DB32FA780826A011D0124B24C96@HIPPO.olympus.local>
In-reply-to: <[🔎] 201905230029.x4N0TIPb22610064@floodgap.com>
References: <[🔎] 20190522193853.GB30027@blackholeuniverse.ru> from Dave Woodfall at "May 22, 19 08:38:53 pm" <[🔎] 201905230029.x4N0TIPb22610064@floodgap.com>

> Speaking only for V-2, and not for any other crawlers.

I think it would be a good idea for an informal RFC type document specifying how gopher crawlers should work - that way everyone can design to the same standard.

A sitemap file could just be a list of selector URIs, one per line. Keep it simple.

-Matt

-----Original Message-----
From: Cameron Kaiser <spectre@floodgap.com> 
Sent: 23 May 2019 01:29
To: gopher-project@other.debian.org
Cc: Cameron Kaiser <spectre@floodgap.com>
Subject: Re: please use robots.txt for your gopher apps

Replying to several messages at once:

- The robot checks both "robots.txt" and "0/robots.txt". The reason for those
  two selectors is almost every server will interpret a selector of
  "robots.txt" as a file in its root. The reason for the second in particular
  is UMN or UMN-alike gopherds that like to have the itemtype repeated. The
  first takes precedence. If there is a need for /robots.txt or some other
  variation, I'm not opposed, but it should be justified so I don't hit
  every server with a useless request when the cache expires.

- Please, no regexes, just globs. PCREs in particular are actually Turing-
  complete and I'd prefer not to be running user-written unbounded
  automata :(

- Right now the robot just looks at Disallow: (and allows multiple
  Disallow:s). I can add Allow: support relatively easily, it just might
  not be something done immediately. There is currently an implied Allow: * .

- Supporting Sitemap: is a ways off and would probably need to be
  Gopher-specific. I'm open to design options but wouldn't implement
  anything until there is broad consensus about how that should look.

Speaking only for V-2, and not for any other crawlers.

-- 
------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com
-- The older a man gets, the farther he had to walk to school as a boy. -------

Reply to:

Follow-Ups:
- Re: please use robots.txt for your gopher apps
  - From: Dave Gauer <dave@ratfactor.com>

References:
- Re: please use robots.txt for your gopher apps
  - From: Dave Woodfall <dave@tty1.uk>
- Re: please use robots.txt for your gopher apps
  - From: Cameron Kaiser <spectre@floodgap.com>

Prev by Date: Re: please use robots.txt for your gopher apps
Next by Date: Re: please use robots.txt for your gopher apps
Previous by thread: Re: please use robots.txt for your gopher apps
Next by thread: Re: please use robots.txt for your gopher apps
Index(es):
- Date
- Thread