[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[gopher] Re: New Gopher Wayback Machine Bot

On Wed, Oct 12, 2005 at 04:45:56PM -0700, Cameron Kaiser wrote:
> > Cameron, floodgap.com seems to have some sort of rate limiting and keeps
> > giving me a Connection refused error after a certain number of documents
> > have been spidered.
> I'm a little concerned about your project since I do host a number of large
> subparts which are actually proxied services, and I think even a gentle bot
> going methodically through them would not be pleasant for the other side
> (especially if you mean to regularly update your snapshot).

Valid concern.  I had actually already marked your site off-limits
because I noticed that.  Incidentally, your robots.txt doesn't seem to
disallow anything -- might want to take a look at that ;-)


> I do support robots.txt, see
> 	gopher.floodgap.com/0/v2/help/indexer

Do you happen to have the source code for that available?  I've got
some questions for you that it could explain (or you could), such as:

 1. Which would you use?  (Do you expect URLs to be HTTP-escaped?)

    Disallow: /Applications and Games
    Disallow: /Applications%20and%20Games

2. Do you assume that all Disallow patterns begin with a slash as they
   do in HTML, even if the Gopher selector doesn't?

3. Do you have any special code to handle the UMN case where
   1/foo, /foo, and foo all refer to the same document?

I will be adding robots.txt support to my bot and restarting it shortly.


-- John

John Goerzen
Author, Foundations of Python Network Programming

Reply to: