RE: please use robots.txt for your gopher apps
> Speaking only for V-2, and not for any other crawlers.
I think it would be a good idea for an informal RFC type document specifying how gopher crawlers should work - that way everyone can design to the same standard.
A sitemap file could just be a list of selector URIs, one per line. Keep it simple.
-Matt
-----Original Message-----
From: Cameron Kaiser <spectre@floodgap.com>
Sent: 23 May 2019 01:29
To: gopher-project@other.debian.org
Cc: Cameron Kaiser <spectre@floodgap.com>
Subject: Re: please use robots.txt for your gopher apps
Replying to several messages at once:
- The robot checks both "robots.txt" and "0/robots.txt". The reason for those
two selectors is almost every server will interpret a selector of
"robots.txt" as a file in its root. The reason for the second in particular
is UMN or UMN-alike gopherds that like to have the itemtype repeated. The
first takes precedence. If there is a need for /robots.txt or some other
variation, I'm not opposed, but it should be justified so I don't hit
every server with a useless request when the cache expires.
- Please, no regexes, just globs. PCREs in particular are actually Turing-
complete and I'd prefer not to be running user-written unbounded
automata :(
- Right now the robot just looks at Disallow: (and allows multiple
Disallow:s). I can add Allow: support relatively easily, it just might
not be something done immediately. There is currently an implied Allow: * .
- Supporting Sitemap: is a ways off and would probably need to be
Gopher-specific. I'm open to design options but wouldn't implement
anything until there is broad consensus about how that should look.
Speaking only for V-2, and not for any other crawlers.
--
------------------------------------ personal: http://www.cameronkaiser.com/ --
Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com
-- The older a man gets, the farther he had to walk to school as a boy. -------
Reply to: