[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] Spidering teh gopherspace



Suggestion:

Let the user decide in robots.txt the lowest acceptable delay.

(but make sure to have a good default delay)

---
//Alex
"Look boys, the graphical part is done. Now we just have to code it!" --tdist



On Tue, Apr 13, 2010 at 4:40 PM, Kim Holviala <kim@holviala.com> wrote:
> Yeah, me again.
>
> Since I got the server done I promised to start writing clients. But
> instead, I got sidetracked and decided to code a search engine first...
>
> I got a basic spider up and running (pure POSIX C again) in a couple of
> hours and succesfully tested it against my own server. Which then first
> throttled me (1-second delay before reply), and after I cleared the sessions
> inetd decided that it had enough and kicked me totally out.
>
> So, being slightly viser I inserted some delays to the spidering engine
> which would prevent the killing of the server being spidered.
>
> And before anyone asks, yes, I'll make it support robots.txt.
>
> Anyway, on to a few questions: What kind of spidering rate would the admins
> here accept? The spider will index types 0 and 1 (text documents and menus)
> and currently does three hits per second (actually, a hit and a 1/3 second
> delay). I think that's too fast - so how does a hit per second sound like?
> I'll take forever to spider things, but at least it wouldn't kill anyones
> server....
>
> I'm also thinking about bandwidth limiting, but I need to see if that's
> possible (being on the receiving end).
>
>
> - Kim
>
>
> _______________________________________________
> Gopher-Project mailing list
> Gopher-Project@lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/gopher-project
>

_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project




Reply to: