Re: BenCox's Gopher Index

To: gopher-project@other.debian.org
Subject: Re: BenCox's Gopher Index
From: Christoph Lohmann <20h@r-36.net>
Date: Mon, 13 Aug 2018 11:08:33 +0200
Message-id: <[🔎] 20180813091531.C1BAE10033D5F@r-36.net>
In-reply-to: <[🔎] 5FF4F0F58D4ACF4DB32FA780826A011DB9290060@HIPPO.olympus.local>
References: <[🔎] 5FF4F0F58D4ACF4DB32FA780826A011DB9290060@HIPPO.olympus.local>

Greetings.

On Mon, 13 Aug 2018 11:08:33 +0200 Matt Owen <matt@jaruzel.com> wrote:
> Saw this link on Hacker News and though it was worth posting here:
> 
> https://blog.benjojo.co.uk/post/building-a-search-engine-for-gopher

We have been talking about this at the current brcon.

	gopher://bitreich.org/1/con/2018

The slides and recordings are online.

Since I have seen many people trying to write some search engine, crawler etc.
to search the web and everytime the data is hidden because of bad source code
or maybe disinterest after some analysis was done, I was thinking, why this
happened. My conclusion: First it sounds easy, then the deployment for the
community is hard.

Here are the slides from my talk:

---
   A Possible Gopher Search (0/1)
  ================================

 
 o There is Veronica, but according to Cameron Kaiser it has an ugly source base.
 o The idea of a brand new Gopher Search is here.
 o We had different crawlers this year:
         o gopher://kalos.mine.nu/1/burrow/
         o https://github.com/blabber/grawler
         o ... and more.
 o They all end up in closed indexes which are not reused.
 
 o We are a hobbyist-hosted network and thus have no indefinite resources.
 o Gopher should be the better web.
         o Is a closed search engine a better web?

   A Possible Gopher Search (1/1)
  ================================

 
 My Proposal:
 
 o Let us define robots.txt for gopher.
 
         User-Agent: Google
         Disallow: /
 
         Crawl-Index: gopher://bitreich.org/0/index.crawl
         
 o Let us define an index format, which describes all needed metadata for
   crawler results.
         o All gopherholes can simple offer it, so crawlers do not wast the
           bandwidth.
         	o ex. gopher://bitreich.org/0/index.crawl
         	o If that exists, the crawler will not go further.
 
 o Let us have different parts of the search infrastructure:
         o Crawlers, which output the above format.
         o Indexers, which index the above format according to current science
           or graph theory.
         o Interfaces, which show the results.
 
 o With those layers and open source development we make the difference to the
   web and its closed architecture.
---

As you can see, I want to modularize, so we do not end up the Google way, of
hidden data from the community.

The TODO list:

1.) Make the robots.txt happen.
2.) Write some gopher-mirror(1), to have a dir/menu structure.
3.) Have the raw data available for crawlers, as tgz of the above structure.
4.) Host some public community-agreed mirror of that data.
	* gopherproject.org? I can offer space.
5.) Write indexers.
	* One person at brcon was just interested in linguistics.
	* One person was asking for just used menu item types.
	* Search.
6.) Offer interfaces for the search.
	* Gopher.
	* IRC bot at bitreich.org

My goal is to bundle community resources and not waste the bandwidth of small
systems like the web. We can be different to the web.



Sincerely,

Christoph Lohmann

💻 https://r-36.net
💻 gopher://r-36.net
☺ https://r-36.net/about
🔐 1C3B 7E6F 9805 E5C8 C0BD  1F7F EA21 7AAC 09A9 CB55
🔐 http://r-36.net/about/20h.asc
📧 20h@r-36.net

Reply to:

References:
- BenCox's Gopher Index
  - From: Matt Owen <matt@jaruzel.com>

Prev by Date: Re: Good html to text rendering
Next by Date: Re: Good html to text rendering
Previous by thread: Re: BenCox's Gopher Index
Next by thread: Good html to text rendering
Index(es):
- Date
- Thread