Re: [gopher] Gopher++ scrapped & Internet Archive -style thingy

To: Gopher Project Discussion <gopher-project@lists.alioth.debian.org>
Subject: Re: [gopher] Gopher++ scrapped & Internet Archive -style thingy
From: Mike Hebel <nimitz@nimitzbrood.com>
Date: Tue, 20 Apr 2010 07:40:23 -0500
Message-id: <[🔎] 01CABA19-0B25-4F7C-978C-348DC92E61BA@nimitzbrood.com>
Reply-to: Gopher Project Discussion <gopher-project@lists.alioth.debian.org>
In-reply-to: <[🔎] 4BCD7322.4070407@holviala.com>
References: <[🔎] 4BCD7322.4070407@holviala.com>


On Apr 20, 2010, at 4:25 AM 4/20/10, Kim Holviala wrote:

As part of my project to code a neat search engine to cover thewhole Gopherspace I've (partially) crawled sites and snooped andresearched a lot of stuff.
Let's just say that the Gopherspace is small, but interesting. I'mglad I started crawling :-).
Anyway.
Whatever I've written about the gopher++ extra headers can now beconsidered as "obsolete". I found a few live sites which just cannotaccept anything else than a selector<CRLF> so there's no way I caninsert extra headers without breaking stuff. Those sites even breakwith type 7 queries (and gopher+) so I'm kind of giving up now.
All code regarding the header extensions has been scrapped anddeleted, it's all gone for good. The good thing is that my code isnow 100% compatible with ALL early 90's servers but the bad thing isthat the neat charset conversion thingy is now all gone and we'reback to 7-bit US-ASCII (or non-working Latin/UTF). Oh, well.

I'm confused here. Is this the client side of things or the serverside? If the goal is to keep Gopher moving forward then why notcreate a better server with an expanded protocol? And if it's justyour servers that do the gopher++ dance then why does it matter ifother servers don't? Other than crawling the servers don't interactas far as I can tell. (Unless I'm once again being dense.)

As my search engines indexer is an offline one my spider basicallycrawls around and saves all type 0&1 files to a local cachehierarcy. This was mostly accidental, but I managed to createsomething very much like The Internet Archive but for gopher.Basically, you give the cache manager an url and it gives you backthe cached page (if it has it) AND it mangles menus so that as longas the pages are in cache you'll stay in the cache.
It's kind of like a combination of Google's cache and archive.org,only it works better than either of those...
Here's a cached copy of (partial) Floodgap:
gopher://gophernicus.org/1/cache.q?gopher://gopher.floodgap.com

It even cached itself:
gopher://gophernicus.org/1/cache.q?gopher://gophernicus.org
Notice how the cached Floodgap is much faster than the originalone ;D. I wish there was something like this for teh web....
<turtleneck shirt mode on>
One more thing,
</turtleneck>
I'll be crawling everything in about a month or so, so now is thetime to fix your robots.txt if you don't want your files to end upin the cache.


Very cool. :-)

--
Mike

"All we wanna do is eat your brains! We're not unreasonable, I mean noone's gonna eat your eyes." - Re: Brains, Jonathan Coultan



_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project

Reply to:

Follow-Ups:
- Re: [gopher] Gopher++ scrapped & Internet Archive -style thingy
  - From: Kim Holviala <kim@holviala.com>

References:
- [gopher] Gopher++ scrapped & Internet Archive -style thingy
  - From: Kim Holviala <kim@holviala.com>

Prev by Date: [gopher] Gopher++ scrapped & Internet Archive -style thingy
Next by Date: Re: [gopher] Gopher++ scrapped & Internet Archive -style thingy
Previous by thread: [gopher] Gopher++ scrapped & Internet Archive -style thingy
Next by thread: Re: [gopher] Gopher++ scrapped & Internet Archive -style thingy
Index(es):
- Date
- Thread