On Mon, Oct 31, 2005 at 09:42:03AM +0100, Alessandro Selli wrote:
> John Goerzen wrote:
> > Here's an update on the gopher bot:
> > 
> > There is currently 28G of data archived representing 386,315
> > documents.  1.3 million documents remain to be visited, from
> > approximately 20 very large Gopher servers.  I believe, then, that the
> > majority of gopher servers have been cached by this point.  3,987
> > different servers are presently represented in the archive.
>    Amazing.  I dare say: too good to be true!

Yes, you're right. sigh.

> Are you definitively, positively sure about all this stuff beeing served
> by so many active Gopher servers?

I forgot to take into account that the bot creates a directory for the
data from a given server before it tries to connect to it.  So it tried
to connect to 3,987 servers.

Actually, I received documents from 216 servers.  Sigh.

So far, the top server in terms of number of selectors downloaded is
serpiente.dgsca.unam.mx with over 57,000.  But many of the top servers
are still being crawled.

-- John

