It was thus said that the Great solderpunk once stated:
> Greetings Gophernauts,
> I have two questions regarding the correct way to format URLs for Gopher
> search engines when the search term is to be specified.
> The first is: to what extent is RFC 4266 ("The gopher URI Scheme") being
> adhered to by the modern Gopher community? Are we trying to follow it,
> or is it being actively ignored in the same way that Gopher+ is being
> actively ignored?
I'm not sure of the answer to this. Having written both a generic URL
parser  and a separate one just for gopher  I did wonder why RFC-4266
never used a URL query string. Upon thinking about it, I think the thinking
was to ease support of gopher URLs in gopher clients. Because if you think
about it, a gopher URL like:
Strip off the scheme part:
Skip the leading two '//' and remove the authority section  to get the
leaving the path portion:
Now decode the resulting string and you get:
"/7search the answer to everything"
Ignore the leading '/' (required by the URL syntax for such things). The
first character of the remaining string is the type, and the rest is what
you can send *exactly* to the gopher server to get the results. The code to
"parse" a gopher URL is pretty easy with regular expressions (and excuse me
if I get this wrong, but I don't normally use regexs all that often):
host,type,request = /^gopher://([^/]+)/(.)(.*)
request = url_decode(request)
I can't prove this, but I strongly suspect this is why we got the gopher
URL syntax we did (one complication---parsing the optional port number, but
I don't know regex syntax well enough to implement that).
> RFC 4266 says very clearly in section 2.2 that:
> > If the URL refers to a search to be submitted to a Gopher search
> > engine, the selector is followed by an encoded tab (%09) and the
> > search string.
> This is consistent with the earlier syntax from section 2.1:
> > A Gopher URL takes the form:
> > gopher://<host>:<port>/<gopher-path>
> > where <gopher-path> is one of:
> > <gophertype><selector>
> > <gophertype><selector>%09<search>
> > <gophertype><selector>%09<search>%09<gopher+_string>
> However, if I use Lynx to navigate to the Veronica 2 search engine at
> Floodgap and do a search for "cheese", then use the = button to get Lynx
> to show me the URL of my current location, it tells me I am at:
> Note the use of ? instead of %09 to separate the search term from the
> I tried to see what other clients do here to see if there was a rough
> consensus, but was surprised to find that very few clients actually
> provide a way to get the URL of the current Gopher item! VF-1 does, but
> it doesn't include search terms at all, which is something I'd like to
That's probably because URLs aren't used that often (or weren't used that
often) in gopher. The 'h' type is not specified in RFC-1436, and gopher
maps were developed before URLs were a thing.
> Most clients of course do include a way to navigate to a URL, so I was
> able to test visiting Veronica result pages using both kinds of syntax.
> It seems that most clients work with the Lynx-style URL where the search
> term is separated from the selector by a ?, and fail with %09 URLs.
> It's not clear to me whether clients are replacing the ? with an
> unecoded tab when they send the request to Veronica, or whether Veronica
> recognises such requests as an alternative syntax and treats them
> equivalent to RFC 1436 compliant requests using a tab separator.
> My second question is: what is the correct item type to use for a URL
> which includes a search term? As far as I can see, RFC 4266 is silent
> on this matter. Lynx uses item type 7. This might initially seem
> obviously correct, but I think it's not actually so clear cut.
I can see the use of:
(that is, use '7' when there's no search term, but '1' othewise).
> Assuming the search query is represented with the %09 separator as per
> RFC 4266, and the client is smart enough to undo URL encoding before
> sending the selector to the server, then using item type 1 would mean
> that such URLs "just work" without any special consideration on the
> part of the client. In contrast, using item type 7 means that clients
> asked to visit a URL with item type 7 need to examine the path of the
> URL for either a ? or a %09 separator in order to decide how to proceed,
> whether they prompt the user to input a search term or whether there is
> a term already included in the URL. The former seems a bit more elegant
> to me.
I would hate to have to attempt to merge RFC-4266 with RFC-3986---the two
types of URLs are semantically quite different (for instance, RFC-4266
doesn't support the full authority syntax from RFC-3986 just for starters).
 Terminology from RFC-3986