[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: please use robots.txt for your gopher apps



> Here's my write-up. Feel free to use in whatever way you want.
> Discussion, edits, clarifications, versioning, updates, I welcome it.
> Let's focus on succinctly specifying what V-2 expects, though.
> 
> gopher://alexschroeder.ch:70/02019-05-24_Robots_and_Gopher
> gophers://alexschroeder.ch:7443/02019-05-24_Robots_and_Gopher
> https://alexschroeder.ch/wiki/2019-05-24_Robots_and_Gopher
> 
> Personally, my firewall will block bots that don't have a crawl-delay of 1s,
> and I can't put that into the robots.txt as specified so if we're going to make
> amendments, that would be my first addition. :)

Thanks for doing this.

I don't mind Crawl-delay, though as a practical matter V-2 wouldn't
even come close to multiple requests within a single second (this won't
be even in the ballpark until I finish the multithreaded crawler which
I confidently expect around the year 2040).

V-2 does actually understand User-agent and obeys 'veronica' and '*'. 

I'm not opposed to adding ^ and $ for anchoring, but not more than that
for reasons already given (certainly no other regex metacharacters).

Somewhere in the community standard there should be a recommendation on
how often to fetch robots.txt (i.e., how long to cache it). Currently
V-2 rechecks robots.txt every 24 hours. If there is interest, I'm not
opposed to allowing this to be configurable by a site owner, but it should
only represent a minimum TTL (i.e., a robot could cache it longer), and
there should still be a recommendation on the default cache time when
unspecified. Feel free to take whatever number out of the air you want, or
use the 24 hour timeout V-2 already implements.

I still don't have a good idea what to do about hosts with user menus
and scripts.

-- 
------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@floodgap.com
-- there's a dance or two in the old dame yet. -- mehitabel -------------------


Reply to: