[gopher] Re: New Gopher Wayback Machine Bot
On Wed, Oct 12, 2005 at 04:45:56PM -0700, Cameron Kaiser wrote:
> > Cameron, floodgap.com seems to have some sort of rate limiting and keeps
> > giving me a Connection refused error after a certain number of documents
> > have been spidered.
> I'm a little concerned about your project since I do host a number of large
> subparts which are actually proxied services, and I think even a gentle bot
> going methodically through them would not be pleasant for the other side
> (especially if you mean to regularly update your snapshot).
Valid concern. I had actually already marked your site off-limits
because I noticed that. Incidentally, your robots.txt doesn't seem to
disallow anything -- might want to take a look at that ;-)
> I do support robots.txt, see
Do you happen to have the source code for that available? I've got
some questions for you that it could explain (or you could), such as:
1. Which would you use? (Do you expect URLs to be HTTP-escaped?)
Disallow: /Applications and Games
2. Do you assume that all Disallow patterns begin with a slash as they
do in HTML, even if the Gopher selector doesn't?
3. Do you have any special code to handle the UMN case where
1/foo, /foo, and foo all refer to the same document?
I will be adding robots.txt support to my bot and restarting it shortly.
Author, Foundations of Python Network Programming