Re: [gopher] Gopher++ scrapped & Internet Archive -style thingy
On Apr 20, 2010, at 4:25 AM 4/20/10, Kim Holviala wrote:
As part of my project to code a neat search engine to cover the
whole Gopherspace I've (partially) crawled sites and snooped and
researched a lot of stuff.
Let's just say that the Gopherspace is small, but interesting. I'm
glad I started crawling :-).
Anyway.
Whatever I've written about the gopher++ extra headers can now be
considered as "obsolete". I found a few live sites which just cannot
accept anything else than a selector<CRLF> so there's no way I can
insert extra headers without breaking stuff. Those sites even break
with type 7 queries (and gopher+) so I'm kind of giving up now.
All code regarding the header extensions has been scrapped and
deleted, it's all gone for good. The good thing is that my code is
now 100% compatible with ALL early 90's servers but the bad thing is
that the neat charset conversion thingy is now all gone and we're
back to 7-bit US-ASCII (or non-working Latin/UTF). Oh, well.
I'm confused here. Is this the client side of things or the server
side? If the goal is to keep Gopher moving forward then why not
create a better server with an expanded protocol? And if it's just
your servers that do the gopher++ dance then why does it matter if
other servers don't? Other than crawling the servers don't interact
as far as I can tell. (Unless I'm once again being dense.)
As my search engines indexer is an offline one my spider basically
crawls around and saves all type 0&1 files to a local cache
hierarcy. This was mostly accidental, but I managed to create
something very much like The Internet Archive but for gopher.
Basically, you give the cache manager an url and it gives you back
the cached page (if it has it) AND it mangles menus so that as long
as the pages are in cache you'll stay in the cache.
It's kind of like a combination of Google's cache and archive.org,
only it works better than either of those...
Here's a cached copy of (partial) Floodgap:
gopher://gophernicus.org/1/cache.q?gopher://gopher.floodgap.com
It even cached itself:
gopher://gophernicus.org/1/cache.q?gopher://gophernicus.org
Notice how the cached Floodgap is much faster than the original
one ;D. I wish there was something like this for teh web....
<turtleneck shirt mode on>
One more thing,
</turtleneck>
I'll be crawling everything in about a month or so, so now is the
time to fix your robots.txt if you don't want your files to end up
in the cache.
Very cool. :-)
--
Mike
"All we wanna do is eat your brains! We're not unreasonable, I mean no
one's gonna eat your eyes." - Re: Brains, Jonathan Coultan
_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project
Reply to: