Re: gopher-project@other.debian.org
It was thus said that the Great solderpunk once stated:
> Greetings Gophernauts,
>
> I have two questions regarding the correct way to format URLs for Gopher
> search engines when the search term is to be specified.
>
> The first is: to what extent is RFC 4266 ("The gopher URI Scheme") being
> adhered to by the modern Gopher community? Are we trying to follow it,
> or is it being actively ignored in the same way that Gopher+ is being
> actively ignored?
I'm not sure of the answer to this. Having written both a generic URL
parser [1] and a separate one just for gopher [2] I did wonder why RFC-4266
never used a URL query string. Upon thinking about it, I think the thinking
was to ease support of gopher URLs in gopher clients. Because if you think
about it, a gopher URL like:
gopher://example.com/7search%09the%20answer%20to%20everything
Strip off the scheme part:
//example.com/7search%09the%20answer%20to%20everything
Skip the leading two '//' and remove the authority section [3] to get the
host name:
example.com
leaving the path portion:
/7search%09the%20answer%20to%20everything
Now decode the resulting string and you get:
"/7search the answer to everything"
Ignore the leading '/' (required by the URL syntax for such things). The
first character of the remaining string is the type, and the rest is what
you can send *exactly* to the gopher server to get the results. The code to
"parse" a gopher URL is pretty easy with regular expressions (and excuse me
if I get this wrong, but I don't normally use regexs all that often):
host,type,request = /^gopher://([^/]+)/(.)(.*)
request = url_decode(request)
I can't prove this, but I strongly suspect this is why we got the gopher
URL syntax we did (one complication---parsing the optional port number, but
I don't know regex syntax well enough to implement that).
> RFC 4266 says very clearly in section 2.2 that:
>
> > If the URL refers to a search to be submitted to a Gopher search
> > engine, the selector is followed by an encoded tab (%09) and the
> > search string.
>
> This is consistent with the earlier syntax from section 2.1:
>
> > A Gopher URL takes the form:
> >
> > gopher://<host>:<port>/<gopher-path>
> >
> > where <gopher-path> is one of:
> >
> > <gophertype><selector>
> > <gophertype><selector>%09<search>
> > <gophertype><selector>%09<search>%09<gopher+_string>
>
> However, if I use Lynx to navigate to the Veronica 2 search engine at
> Floodgap and do a search for "cheese", then use the = button to get Lynx
> to show me the URL of my current location, it tells me I am at:
>
> gopher://gopher.floodgap.com/7/v2/vs?cheese
>
> Note the use of ? instead of %09 to separate the search term from the
> selector.
>
> I tried to see what other clients do here to see if there was a rough
> consensus, but was surprised to find that very few clients actually
> provide a way to get the URL of the current Gopher item! VF-1 does, but
> it doesn't include search terms at all, which is something I'd like to
> fix.
That's probably because URLs aren't used that often (or weren't used that
often) in gopher. The 'h' type is not specified in RFC-1436, and gopher
maps were developed before URLs were a thing.
> Most clients of course do include a way to navigate to a URL, so I was
> able to test visiting Veronica result pages using both kinds of syntax.
> It seems that most clients work with the Lynx-style URL where the search
> term is separated from the selector by a ?, and fail with %09 URLs.
> It's not clear to me whether clients are replacing the ? with an
> unecoded tab when they send the request to Veronica, or whether Veronica
> recognises such requests as an alternative syntax and treats them
> equivalent to RFC 1436 compliant requests using a tab separator.
>
> My second question is: what is the correct item type to use for a URL
> which includes a search term? As far as I can see, RFC 4266 is silent
> on this matter. Lynx uses item type 7. This might initially seem
> obviously correct, but I think it's not actually so clear cut.
I can see the use of:
gopher://example.com/7search
gopher://example.com/1search%09term
(that is, use '7' when there's no search term, but '1' othewise).
> Assuming the search query is represented with the %09 separator as per
> RFC 4266, and the client is smart enough to undo URL encoding before
> sending the selector to the server, then using item type 1 would mean
> that such URLs "just work" without any special consideration on the
> part of the client. In contrast, using item type 7 means that clients
> asked to visit a URL with item type 7 need to examine the path of the
> URL for either a ? or a %09 separator in order to decide how to proceed,
> whether they prompt the user to input a search term or whether there is
> a term already included in the URL. The former seems a bit more elegant
> to me.
I would hate to have to attempt to merge RFC-4266 with RFC-3986---the two
types of URLs are semantically quite different (for instance, RFC-4266
doesn't support the full authority syntax from RFC-3986 just for starters).
-spc
[1] https://github.com/spc476/LPeg-Parsers/blob/master/url.lua
[2] https://github.com/spc476/LPeg-Parsers/blob/master/url/gopher.lua
[3] Terminology from RFC-3986
Reply to: