[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] Updated Gopher RFC



On 20 May 2012 07:09, Bradley D. Thornton <Bradley@northtech.us> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
>
> On 05/19/2012 12:28 PM, Nick Matavka wrote:
>> On 19 May 2012 14:11, Nuno J. Silva <nunojsilva@ist.utl.pt> wrote:
>>> On 2012-05-19, Nick Matavka wrote:
>>>
>>>> I have an idea for selectors.  There is a large difference between
>>>> word processing files and page layout languages, so I recommend
>>>> keeping both d and p selectors.  d could be for word processing
>>>> documents such as Word, OpenOffice, and WordPerfect, and p could be
>>>> for page layout and markup languages such as LaTeX, PDF, PostScript,
>>>> and Rich Text Format.
>>>
>>> About LaTeX, I believe either you're serving the code as something for
>>> the user to read (hence type 0 or another type that gets established for
>
> I expect to read a 0 in my client, and download a 9 so I can choose from
> the myriad of apps I choose between to open a particular type of file.
>
> Wouldn't anything else require bloated intelligence on the part of my
> client?
>
>
> - --
> Bradley D. Thornton
> Manager Network Services
> NorthTech Computer
> TEL: +1.310.388.9469  (US)
> TEL: +44.203.318.2755 (UK)
> TEL: +41.43.508.05.10 (CH)
> http://NorthTech.US
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Find this cert at x-hkp://pool.sks-keyservers.net
>
> iQEcBAEBAwAGBQJPuNDfAAoJEE1wgkIhr9j39+wIALsgwIWjypUSgt+Ns0gl8/0s
> SJvTTFCIp4hUAaQdNcfkK7J3pCgtm2wceObciV+IDqih2X1uAYRPr4DpODZm2Crx
> sBLhK85tuJs/FRHHN4MtAesPdRrdzY0kHZsiMrGz1BT0glMEz3t9wq8MNu3nxGBo
> CDJhBl+xoc/6r+nYhO/A+g8AiPxJhvR4slS/8uxiuZxbW7alUDO/txSmlPPKgngz
> rF+ndmhQcWOO3IRqzhu+gc+WI23oazrGwo9i1d/Jw01RGUyouGiKWEKAyAdui6OM
> W4GscwJ6ZAmisxLra2XR3HLTIz3sXFZTb4PlWLTl8+1ID+N7B5ZTpVuhLhwlKs8=
> =CylN
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Gopher-Project mailing list
> Gopher-Project@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project

Well, gentlefolks, here it is.  My draft RFC.  It is here in three
formats, and you can choose depending on the capabilities of your
computer or bandwidth.  The .md file is a Markdown version, i.e.
GENTLY marked-up plain text.  The HTML version is plain HTML, should
display in any browser including Lynx.

There is also the fancy Google Docs version for editors, at
www.tinyurl.com/gopherrfc-draft

-- Ted Matavka

-- 
       /^\/^\
       \----|
   _---'---~~~~-_
    ~~~|~~L~|~~~~
       (/_  /~~--
     \~ \  /  /~
   __~\  ~ /   ~~----,
   \    | |       /  \
   /|   |/       |    |
   | | | o  o     /~   |
 _-~_  |        ||  \  /
(// )) | o  o    \\---'
//_- |  |          \
//   |____|\______\__\
~      |   / |    |
       |_ /   \ _|
     /~___|  /____\
Title: Goals

Goals

  1. Author a contemporary specification of the Gopher communications protocol. Do not attempt to reinvent the wheel.
  2. Combine Gopher standards into one reference document, reflecting actual practice.
  3. RFC 1346 compliance a must
  4. Describe and/or update handling of errors
  5. Describe and/or update handling of “URL:”
  6. Describe and implement item type “M” (MIME) handling

What is Gopher?

Gopher is a lightweight, client-server oriented query-answer protocol, functioning as a world-wide information system (WWIS). It accomplishes its purpose by facilitating access to other servers around the world, whether they be running Gopher or not. The protocol and software permits users on a heterogeneous mix of desktop systems to browse, search, and retrieve documents residing on multiple distributed server machines. Faster and more hierarchical than HTTP, Gopher provides the ideal method for transmitting information from and to mobile devices.

Basic Gopher Transactions

There are four broad forms of basic transactions in Gopher:

The precise make-up of these transactions is elucidated below.

Menu Transaction
----------------
Client : [Open Connexion]
Client : Send [selector<CR><LF>]
Server : Send <Menu>
Server : [Close Connexion]

Index Transaction
-----------------
Client : [Open Connexion]
Client : Send  [selector<TAB>query parameters<CR><LF>]
Server : Send [<Menu>]
Server : [Close Connexion]

Simple Text Transaction
-----------------------
Client : [Open Connexion]
Client : Send  [selector<CR><LF>]
Server : Send [<Simple Text>]
Server : [Close Connexion]

Binary Transaction
------------------
Client : [Open Connexion]
Client : Send  [selector<CR><LF>]
Server : Send [<Raw Binary Data>]
Server : [Close Connexion]

Gopher servers are normally found on TCP/IP port 70. Clients should assume this port if no other port is specified. When a client opens a connection to a server, the server should accept the connection but say nothing, waiting for a CRLF-terminated selector string from the client. The client should then send the selector string followed by CRLF (or nothing to retrieve the root menu from the server, which MUST always be type 1). The server should then send the requested content and close the connection.

Line Terminators

ASCII, the international standard that governs the interchange of plain-text information between computer systems, is nothing more or less than a table mapping each character (letter, number, space, or symbol) to a numerical code, which is then converted to binary and written to disc. Its necessity was seen long before the advent of the electronic monitor, so some of its more unique quirks must be understood in view of the time period of which it was a product. Historically, input and output was through a specially-adapted typewriter, and the ASCII convention reflects this in the codes it uses to terminate lines of text.

In ASCII, there are two codes, both having physical equivalents in the real world, that signal the end of the line: the Carriage Return (abbreviated C/R, CR, or c/r) and the Line Feed (abbreviated L/F, LF, or l/f). Originally, the term carriage return was used for a command that caused the assembly holding the paper (the carriage) to return to the right so the machine was ready to type again on the left side of the paper (assuming a left-to-right language). On the other hand, the line feed moved the paper upwards, allowing the carriage to type on the following line.

Different operating systems traditionally signal the end of a line in different ways. UNIX and its descendants (including all modern versions of Mac OS), the operating systems most likely to run on a server, use the line feed alone. CP/M, DOS, and Microsoft Windows use the sequence of carriage return and line feed (CR/LF). Obsolete versions of Mac OS (up to, and including, System 9) use the carriage return alone.

Throughout the Gopher system, however, this does not apply. Internal Gopher commands must always use CR/LF, regardless of the type of system sending them. The situation becomes more unclear in regards to sending text files, however. Although the client is required to be capable of converting between line terminators, it is strongly recommended that files on the server be maintained in standard Gopher format (that is, CR/LF).

Capability Files

It is recommended, when hosting a public-access Gopher server, to include a capability file. Although it is, ultimately, the choice of the owner or operator of the server, a capability (or caps) file can be useful for clients querying the server for certain information without using extensions such as Gopher+.

The purpose of a caps file is so that a server can instruct a client on how to properly parse selectors in its filesystem; it ensures that the client can understand how files on the server are organised. The scheme used in the current implementation of caps can handle POSIX (UNIX and related operating systems), FAT/NTFS (used by Microsoft Windows), and HFS (used by Apple Mac OS). Due to technical issues, caps files can not handle VMS or Files-11 paths. Caps files, however, use an open interface; they can be arbitrarily extended.

A caps file is quite simple in its composition: it is a plain text file with no more than seventy characters per line in the root directory of a Gopher server with the name

caps.txt

and beginning with the six characters

CAPS[CR][LF]

Because of the constrained name and location of the caps file, it is a trivial matter to verify if one exists or not; the address is always of the form <gopher://gopher.example.com/1/caps.txt>, with the real name of the server substituting for example.

A caps file contains keys, values, and comments.

Keys can be compared to labelled containers for data; for instance, the key ServerSoftware is a container for the name of the Gopher software running on the server. Keys in caps files are always alphanumeric (i.e., composed of letters and numbers only) and generally are in CamelCase (each individual word within the key capitalised). The data in these containers is called a value; values can use letters, numbers, and symbols. Keys and values are connected by the equals (=) sign. Any amount of whitespace (spaces and tabs) around the equals sign is acceptable.

Anything not conforming to the syntax

SomeKey = Value

is ignored (treated as a comment). To be standards-compliant, comments must begin with a hash (#) sign; more importantly, they must be on a line to their own.

Below is an example caps file.

CAPS
# These four characters must be at the beginning to identify the file
# as successfully fetched.

# This is a caps file. This contains a list of key=value pairs that are
# useful to clients wishing to query the server for special information
# without using extensions such as Gopher+. Not all clients support caps
# queries, so your site should be navigable without it. This is an
# optional feature and is not currently a Gopher protocol standard.
# Blank lines and lines starting with # are ignored.
# Clients should cache the information where possible.
# Some servers may automatically generate caps files for you as a pseudo
# selector. In that case, this file may serve only to supersede the
# machine-generated keys. You should read your server documentation.
#
# To use this file, customize it and place it in your server's root mountpoint
# such that a fetch for selector "caps.txt" will retrieve it.
#
# All keys are optional. Not all keys listed here need be specified, and
# in fact many sites won't specify all of them. The client should be
# prepared to deal with that too.

### CAPS META PROPERTIES ###
#
# Spec version of this caps file. This should be the first key specified
# so that an incompatible later format might be detected by the client.
CapsVersion=1

# This tells the client the recommended caps cache expiry time, in seconds.
# This particular property tells the client to refetch the caps file after
# an hour has passed, preferentially. This is optional for the client to
# implement.
ExpireCapsAfter=3600

### PATH SECTION. USE THESE DEFAULT VALUES IF YOU ARE ON A POSIX FILESYSTEM ###

# This tells the client how to cut up a selector into a breadcrumb menu.
# This is a simple ASCII string. If it is not specified, the selector is
# treated as if it were opaque. The client may collapse consecutive
# delimiters (e.g., x//y is treated as x/y) except if PathParentDouble is
# true (for Mac).
PathDelimeter=/

# This tells the client what the "identity" path is, i.e., it can treat
# this as a no-op, turning x/./y into x/y. If this is not specified, the
# literal path . is used.
PathIdentity=.

# This tells the client what the parent path is, i.e., it can treat this
# as a path instruction to delete previous path, turning x/y/../z into x/z
# If this is not specified, the literal path .. is used.
PathParent=..

# This tells the client that consecutive path delimeters are treated as
# parent (mostly for Mac HFS prior to Mac OS X), e.g., turning
# MacHD:x:y:::z into MacHD:z. If this is not specified, it is default FALSE.
PathParentDouble=FALSE

# This tells the client the escape character for quoting the above
# metacharacters. Most of the time this is \. If this is not specified,
# no escape characters are used.
PathEscapeCharacter=\

# This tells the client not to cut everything up to the first path delimeter.
# Normally caps makes gopher://x/11/xyz and gopher://x/1/xyz both into /xyz
# assuming your server is happy with the latter URL (almost all will be).
# If this is not specified, it is by default FALSE. This should be TRUE
# *only* if your server requires URLs like gopher://x/0xyz (i.e., the
# selector should NOT start with the path delimiter).
PathKeepPreDelimeter=FALSE

### OTHER PROPERTIES ###
#
# Some clients will or may make use of these; some won't.

# Freetext description of the server software and server hardware.
ServerSoftware=Bucktooth
ServerSoftwareVersion=0.2.9
ServerArchitecture=AIX
ServerDescription=IBM Power 520 Express, dual 4.2GHz POWER6 CPU, 8GB RAM
ServerGeolocationString=Southern California, USA

# Special server features.
ServerSupportsStdinScripts=TRUE

# An E-mail contact for the server.
ServerAdmin=gopher@floodgap.com

Robots Files

(This sxn needs work!)

Selector Formats

Type Codes

The following selectors are defined by RFC 1436:

Type    Treat As    Meaning
0       TEXT        Plain text file
1       MENU        Menu
2       EXTERNAL    CCSO flat database (formerly used as telephone 
                    directories); other databases
3       ERROR       Error message
4       TEXT        Macintosh BinHex file
5       BINARY      Binary archive (zip; rar; 7-Zip; gzip; tar)
6       TEXT        UUEncoded archive
7       INDEX       Query a search engine or CGI script
8       EXTERNAL    Telnet to: VT100 series server
9       BINARY      Binary file (see also 5)
+       -           Redundant server (TODO: How is this used?)
T       EXTERNAL    Telnet to: tn3270 series server
g       BINARY      GIF format graphics file (TODO: Why not use I?)
I       BINARY      Any image file.

Additionally, the following selectors have been in common use and are standardised here. If a client does not have the capability to display a particular item type, it should treat it as a more generic item type, passing it off to the operating system (itemtype p “implies” itemtype 0, etc.).

Type    Treat As    Meaning
c       BINARY      Calendar file (Kim Holviala)
d       BINARY      Word-processing document (MS Word; OpenOffice.org; 
                    WordPerfect); PDF document
h       TEXT        HTML document
i       -           Informational text (not selectable)
p       TEXT        Page layout or markup document (TeX; LaTeX; 
                    PostScript; Rich Text Format)—these documents are all                         plain text, but contain ASCII "tags" that make the 
                    document prettier when sent through a special 
                    programme.
m       BINARY      Electronic mail repository (also known as MBOX) (Kim 
                    Holviala)
s       BINARY      Audio recordings (files that consist of audible, but no
                    visible, data) (Wesley Teal)
x       TEXT        eXtensible Markup Language document (Wesley Teal)
;       BINARY      Video files (files that consist of both audible and
                    visible data) (Wesley Teal)

For standards compliance, 4, 6, h, p, and x filetypes send as text (itemtype 0)—this way, the text appears directly on the user’s terminal without being downloaded (unless the appropriate command is sent to the server, i.e. “Save As…”). It is vital to note that text information can be sent via binary (with the minor inconvenience noted above), as binary files contain a greater range of information than ASCII. However, binary files, if sent via text, will be irreparably ruined, as this effectively passes raw eight-bit data through an ASCII filter. In the case of confusion, the owner/operator of the server should simply mark the file as binary to ensure that it transfers safely.

Gopher Menus

Menu (type 1) content has the following format:

T<itemtext>^I<selector>^I<host>^I<port>

Where:

Note on ‘i’ item type: For the ‘i’ item type, Selector, Server, and Port are mostly ignored, but must be there anyway. These should be dummy values. (Floodgap.org uses a blank selector on error.host port 1.) One exception to their being ignored is TITLE entries. These have TITLE as the selector value.

Note on the terminating full stop

The original RFC specified that a terminating full stop (.) character, followed by a newline, should be sent on a line by itself after the end of the content. However, it also made exceptions for binary data. This terminating full stop has caused no end of trouble ever since. Many, if not most, modern Gopher servers omit this terminating full stop. The following practice is therefore suggested.

Titles in Gopher

The previous governing document for Gopher (RFC 1436) did not allow for menus with titles. When one simply browses about Gopherspace, this does not matter; for bookmarking and Gopher crawlers, such as Veronica-2, however, this presents a large problem.

A Gopher TITLE resource has the following format:

i<titletext>^ITITLE^Iexample.com^I0

It is identical to a standard informational resource (itemtype i); the selector string, however, is set to the specific value, “TITLE”.

The composition of the above format is as follows:

A Gopher client that conforms to the above TITLE standard shall render it in one of two ways, depending on the placement of the resource. If the TITLE is the first resource in the document, it shall be considered its principal TITLE and is to be used wherever a principal title is needed (window headings, bookmarks, etc.); furthermore, it is to be rendered in a different size, font, and/or colour to the remainder of the document. In all other cases, it shall be considered a subordinate TITLE and is to be rendered in a different size, font, and/or colour to the remainder of the document, but smaller and/or with less emphasis than the main title.

If a non-standards compliant Gopher client receives a TITLE resource as per above, it will render it as plain informational text As the main TITLE must be on the first line of a menu, it will appear visually similar to a title in any case, although not rendered as such.

Linking to Web Addresses

It is now possible, and standard, to link to documents, preferably in HTML, on the World Wide Web, Gopher’s younger, more widespread cousin, from Gopher itself, using a two-part system: a URL: selector on the Gopher (local) end, and a redirect page (following rules as set out below) on the HTTP (remote) end. Servers need not follow any compliance requirements except for the bulleted list following the example redirect page; if the list is not followed, the server shall be deemed non-compliant.

A client adhering to this specification will, when it sees a Gopher selector with a path starting with URL:, interpret the path as a URL. It will ignore the host and port components of the Gopher selector, using those components from the URL instead, if applicable.

The use of URL: selectors should, wherever possible, be avoided; this is especially true when it is possible to otherwise link to the content and protocol needed. The following URL types are specifically excluded from being linked to by this method:

Authors should avoid links to any document not of HTML type whenever possible. Linking to non-HTML documents will break compatibility with non-compliant Gopher browsers.

A Gopher URL: selector takes the following format:

h<itemtext>^IURL:<address>^I<localhost><localport>

URL: selectors are, generally, identical to standard HTML selectors, but composed of particular data:

It is possible for a non-compliant Gopher client to follow a link to an HTML page, as long as the server is compliant, by the following means: when the client receives a command to follow a URL: selector, it will contact the server that provided the menu, as the originating host and port are mandatory as per this standard.

When a Gopher server receives a request from a client beginning with the string URL:, it will write out an HTML document that redirects the browser to the appropriate place. A conforming example of such a document is as follows:

<HTML>
<HEAD>
<META HTTP-EQUIV="refresh" ">content="2;URL=""
</HEAD>
<BODY>
You are following an external link to a Web site.  You will be
automatically taken to the site shortly.  If you do not get sent
there, please click
<A here">HREF="">here to go to the web site.
<P>
The URL linked is:
<P>
<A http://www.example.com/">HREF="">http://www.example.com/
<P>
Thanks for using Gopher!
</BODY>
</HTML>

This document may be any desired by the server authors, but must adhere to the following requirements. If not, the server shall be deemed non-compliant.

When a non-compliant Gopher client finds a reference to a HTML file (type h), it will open up the file via Gopher, receiving the redirect document using a Web browser. The Web browser will then be redirected to the actual link destination.

Compliant Gopher clients will simply render the target directly.

Algorithm to use with selectors

Here is a description for a hypothetical algorithm for parsing item types, splitting them into levels of interaction.

PROTOCOL
--------
Type    Description     What to do
0       Brief text      Render directly line by line.
1       Menu            Request and analyse menu.  If it contains '3' error
                        node, print error.  Else, render menu in new
                        window.
7       Index/Search 
        Server  

DATA NODES
----------
Type            Description     What to do
4, 9, g, I, c,  Binary file     Request and analyse file.  If it contains 
d, m, s, ;                      '3' error node, print error.  Else, does
                                plug-in exist? If yes, display.  If no, 
                                save to disc.  
6, p, x         Text file       Request and analyse file.  If it contains
                                '3' error node, print error.  Else, print
                                on screen.
h, 2, 8, T      Link            Treat as URL.
5               Archive File    Request and analyse file.  If it contains 
                                '3' error node, print error.  Else, does
                                plug-in exist? If yes, display.  If no, 
                                save to disc.  

For instance, if the client is incapable of handling images as it is text-only, the algorithm above would have it save to disc.

Representation of Gopher Addresses

This section is greatly indebted to RFC 4266.

A Gopher address, or uniform resource locator, takes the form:

gopher://<host>:<port>/<gopher-path>

where <gopher-path> is one of:

* <gophertype><selector>
* <gophertype><selector>%09<search>
* <gophertype><selector>%09<search>%09<gopher+_string>

If : is omitted, the port defaults to 70. is a single-character field to denote the Gopher type of the resource to which the URL refers. The entire <gopher-path> may also be empty, in which case the delimiting “/” is also optional and the defaults to “1”.

is the Gopher selector string. Selector strings are arbitrary sequences of characters; they may not, however, contain the characters corresponding to horizontal tab, line feed, or carriage return. Gopher clients specify which item to retrieve by sending the Gopher selector string to a Gopher server. It is important to know that within the <gopher-path> itself, there are no reserved characters, so one may be arbitrarily creative when creating selector names.

Note that some Gopher strings begin with a copy of the character, in which case that character will occur twice consecutively. The Gopher selector string may be an empty string; this is how Gopher clients refer to the top-level directory on a Gopher server.

If the URL refers to a search to be submitted to a Gopher search engine, the selector is followed by an encoded tab (%09) and the search string. To submit a search to a Gopher search engine, the Gopher client sends the string (after decoding), a tab, and the search string to the Gopher server.

References

Brain dump

This is a brain dump of some other things to include at some point.

/robots.txt Server SHOULD add it ! xDIAMORPHINEx: NO. IT SHOULD BE THE PREFERENCE OF THE OP TO USE THEM; FURTHERMORE, RESTRICTIVE ROBOTS FILES SHOULD BE RECOMMENDED AGAINST.

Errors: Errors should be indicated by returning a gopher menu whose first item is of type 3. However, many servers use various heuristics and may return images, HTML pages, or other non-error-menu content.

Mime-multipart encoded files should use type ’M' and base64 encoding

What about the ‘+’ item-type? No info in RFC 1436

Attachment: gopherspec.md
Description: Binary data

_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project

Reply to: