[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] Spidering the gopherspace



Hi Cameron,

Throttling and randomized servers polling is on my todo list. As well as robots.txt support. For the time being, my crawler is still very rudimentary, and might look a little rude. Work in progress :)

cheers,
Mateusz



On 12/29/2014 12:42 AM, Cameron Kaiser wrote:
Mateusz,

I am spidering the gopherspace since a few days, collecting things for a
new project I will publish soon.

If you see 178.170.108.36 hammering your gopher server, that would be me.

Do not hesitate complaining if you notice I'm too hard on you, I will
try to spare your server then (I already see that sdf.org have
blacklisted me).

FWIW, I throttled several minutes between requests to the same IP (or would
find another to visit in the meantime) and I always honour robots.txt if it
can be fetched (and cache it).

However, since V-2 only fetches menus and has a well-known reverse DNS, I
imagine sites are a little friendlier to me.


_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project




Reply to: