[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: gopher-project@other.debian.org



It was thus said that the Great solderpunk once stated:
> Greetings Gophernauts,
> 
> I have two questions regarding the correct way to format URLs for Gopher
> search engines when the search term is to be specified.
> 
> The first is: to what extent is RFC 4266 ("The gopher URI Scheme") being
> adhered to by the modern Gopher community?  Are we trying to follow it,
> or is it being actively ignored in the same way that Gopher+ is being
> actively ignored?

  I'm not sure of the answer to this.  Having written both a generic URL
parser [1] and a separate one just for gopher [2] I did wonder why RFC-4266
never used a URL query string.  Upon thinking about it, I think the thinking
was to ease support of gopher URLs in gopher clients.  Because if you think
about it, a gopher URL like:

	gopher://example.com/7search%09the%20answer%20to%20everything

  Strip off the scheme part:

	//example.com/7search%09the%20answer%20to%20everything

  Skip the leading two '//' and remove the authority section [3] to get the
host name:

	example.com

leaving the path portion:

	/7search%09the%20answer%20to%20everything

  Now decode the resulting string and you get:

	"/7search	the answer to everything"

  Ignore the leading '/' (required by the URL syntax for such things).  The
first character of the remaining string is the type, and the rest is what
you can send *exactly* to the gopher server to get the results.  The code to
"parse" a gopher URL is pretty easy with regular expressions (and excuse me
if I get this wrong, but I don't normally use regexs all that often):

	host,type,request = /^gopher://([^/]+)/(.)(.*)
	request = url_decode(request)

  I can't prove this, but I strongly suspect this is why we got the gopher
URL syntax we did (one complication---parsing the optional port number, but
I don't know regex syntax well enough to implement that).

> RFC 4266 says very clearly in section 2.2 that:
> 
> > If the URL refers to a search to be submitted to a Gopher search
> > engine, the selector is followed by an encoded tab (%09) and the
> > search string. 
> 
> This is consistent with the earlier syntax from section 2.1:
> 
> > A Gopher URL takes the form:
> >
> > gopher://<host>:<port>/<gopher-path>
> >
> > where <gopher-path> is one of:
> >
> > <gophertype><selector>
> > <gophertype><selector>%09<search>
> > <gophertype><selector>%09<search>%09<gopher+_string>
> 
> However, if I use Lynx to navigate to the Veronica 2 search engine at
> Floodgap and do a search for "cheese", then use the = button to get Lynx
> to show me the URL of my current location, it tells me I am at:
> 
> gopher://gopher.floodgap.com/7/v2/vs?cheese
> 
> Note the use of ? instead of %09 to separate the search term from the
> selector.
> 
> I tried to see what other clients do here to see if there was a rough
> consensus, but was surprised to find that very few clients actually
> provide a way to get the URL of the current Gopher item!  VF-1 does, but
> it doesn't include search terms at all, which is something I'd like to
> fix.

  That's probably because URLs aren't used that often (or weren't used that
often) in gopher.  The 'h' type is not specified in RFC-1436, and gopher
maps were developed before URLs were a thing.

> Most clients of course do include a way to navigate to a URL, so I was
> able to test visiting Veronica result pages using both kinds of syntax.
> It seems that most clients work with the Lynx-style URL where the search
> term is separated from the selector by a ?, and fail with %09 URLs.
> It's not clear to me whether clients are replacing the ? with an
> unecoded tab when they send the request to Veronica, or whether Veronica
> recognises such requests as an alternative syntax and treats them
> equivalent to RFC 1436 compliant requests using a tab separator.
> 
> My second question is: what is the correct item type to use for a URL
> which includes a search term?  As far as I can see, RFC 4266 is silent
> on this matter.  Lynx uses item type 7.  This might initially seem
> obviously correct, but I think it's not actually so clear cut.

  I can see the use of:

	gopher://example.com/7search
	gopher://example.com/1search%09term

(that is, use '7' when there's no search term, but '1' othewise).

> Assuming the search query is represented with the %09 separator as per
> RFC 4266, and the client is smart enough to undo URL encoding before
> sending the selector to the server, then using item type 1 would mean
> that such URLs "just work" without any special consideration on the
> part of the client.  In contrast, using item type 7 means that clients
> asked to visit a URL with item type 7 need to examine the path of the
> URL for either a ? or a %09 separator in order to decide how to proceed,
> whether they prompt the user to input a search term or whether there is
> a term already included in the URL.  The former seems a bit more elegant
> to me.

  I would hate to have to attempt to merge RFC-4266 with RFC-3986---the two
types of URLs are semantically quite different (for instance, RFC-4266
doesn't support the full authority syntax from RFC-3986 just for starters).

  -spc

[1]	https://github.com/spc476/LPeg-Parsers/blob/master/url.lua

[2]	https://github.com/spc476/LPeg-Parsers/blob/master/url/gopher.lua

[3]	Terminology from RFC-3986


Reply to: