[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: please use robots.txt for your gopher apps



Replying to several messages at once:

- The robot checks both "robots.txt" and "0/robots.txt". The reason for those
  two selectors is almost every server will interpret a selector of
  "robots.txt" as a file in its root. The reason for the second in particular
  is UMN or UMN-alike gopherds that like to have the itemtype repeated. The
  first takes precedence. If there is a need for /robots.txt or some other
  variation, I'm not opposed, but it should be justified so I don't hit
  every server with a useless request when the cache expires.

- Please, no regexes, just globs. PCREs in particular are actually Turing-
  complete and I'd prefer not to be running user-written unbounded
  automata :(

- Right now the robot just looks at Disallow: (and allows multiple
  Disallow:s). I can add Allow: support relatively easily, it just might
  not be something done immediately. There is currently an implied Allow: * .

- Supporting Sitemap: is a ways off and would probably need to be
  Gopher-specific. I'm open to design options but wouldn't implement
  anything until there is broad consensus about how that should look.

Speaking only for V-2, and not for any other crawlers.

-- 
------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com
-- The older a man gets, the farther he had to walk to school as a boy. -------


Reply to: