[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[gopher] Spidering teh gopherspace



Yeah, me again.

Since I got the server done I promised to start writing clients. But instead, I got sidetracked and decided to code a search engine first...

I got a basic spider up and running (pure POSIX C again) in a couple of hours and succesfully tested it against my own server. Which then first throttled me (1-second delay before reply), and after I cleared the sessions inetd decided that it had enough and kicked me totally out.

So, being slightly viser I inserted some delays to the spidering engine which would prevent the killing of the server being spidered.

And before anyone asks, yes, I'll make it support robots.txt.

Anyway, on to a few questions: What kind of spidering rate would the admins here accept? The spider will index types 0 and 1 (text documents and menus) and currently does three hits per second (actually, a hit and a 1/3 second delay). I think that's too fast - so how does a hit per second sound like? I'll take forever to spider things, but at least it wouldn't kill anyones server....

I'm also thinking about bandwidth limiting, but I need to see if that's possible (being on the receiving end).


- Kim


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project




Reply to: