[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] Spidering teh gopherspace



Wow. That option actually exists...

User-agent: Yahoo-Blogs
Crawl-delay: 20
Disallow: /tmp

I'll implement that asap.

[5 minutes later]

Done.


- Kim



On 2010-04-14 17:10, Alex Nordlund wrote:
Suggestion:

Let the user decide in robots.txt the lowest acceptable delay.

(but make sure to have a good default delay)

---
//Alex
"Look boys, the graphical part is done. Now we just have to code it!" --tdist



On Tue, Apr 13, 2010 at 4:40 PM, Kim Holviala<kim@holviala.com>  wrote:
Yeah, me again.

Since I got the server done I promised to start writing clients. But
instead, I got sidetracked and decided to code a search engine first...

I got a basic spider up and running (pure POSIX C again) in a couple of
hours and succesfully tested it against my own server. Which then first
throttled me (1-second delay before reply), and after I cleared the sessions
inetd decided that it had enough and kicked me totally out.

So, being slightly viser I inserted some delays to the spidering engine
which would prevent the killing of the server being spidered.

And before anyone asks, yes, I'll make it support robots.txt.

Anyway, on to a few questions: What kind of spidering rate would the admins
here accept? The spider will index types 0 and 1 (text documents and menus)
and currently does three hits per second (actually, a hit and a 1/3 second
delay). I think that's too fast - so how does a hit per second sound like?
I'll take forever to spider things, but at least it wouldn't kill anyones
server....

I'm also thinking about bandwidth limiting, but I need to see if that's
possible (being on the receiving end).


- Kim


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project




Reply to: