[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [gopher] The Gopher Archive



On 22.4.2010 23:07, Brian Koontz wrote:

I'm archiving Teh Gopher. All of it - well all textual searchable
information, not binaries nor images...

The next logical step would be to set up a mechanism to mirror the
archive, because we all know what happens when one large repository
suddenly goes down for what will likely be forever (hal3000.cx,
anyone)?

Replace $ROOT with whatever directory you want to keep the files in.

$ rsync rsync.gophernicus.org::archive/
drwxr-xr-x        4096 2010/04/19 15:51:09 .
drwxr-xr-x        4096 2010/04/23 01:06:42 sites

$ rsync -avz --progress rsync.gophernicus.org::archive/ $ROOT/
receiving incremental file list
created directory  $ROOT
./
sites/
sites/last
          29 100%   28.32kB/s    0:00:00 (xfer#1, to-check=1066/1069)
sites/1/
sites/1/155.198.1.33:70/
sites/3/
sites/3/gopher.386server.info:70/
[...]

Archive directory structure is pretty simple: all of the sites are under, uh, sites/ (more directories are coming under there) and they are grouped by the first letter of the primary domain name.

So for example gopher.floodgap.com's port 70 can be found from $ROOT/sites/f/gopher.floodgap.com:70/

Under the site directory there are one or more subdirectories, the archived files are under the cache/ directory. Under there you have one-letter directories which present the first letter of the md5 sum of the original selector. The actual downloaded files are saved with the selector-md5summed filename and have some mime headers, dual CRLF's and a bit-perfect unmodified copy of the original file.

Uh, complicated.

Let's take this file from floodgap:
/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT

$ printf "/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT" | md5sum

e9c26adf54530a785378971bbac7cd23  -

$ ls -la $ROOT/sites/f/gopher.floodgap.com\:70/cache/e/e9c26adf54530a785378971bbac7cd23

-rw-r--r-- 1 kimmy users 1334 2010-04-23 14:12 $ROOT/f/gopher.floodgap.com:70/cache/e/e9c26adf54530a785378971bbac7cd23

$ head -20 $ROOT/sites/f/gopher.floodgap.com:70/cache/e/e9c26adf54530a785378971bbac7cd23

Location: gopher://gopher.floodgap.com:70/0/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT
Host: gopher.floodgap.com:70
Filetype: 0
Selector: /archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT
Referer: gopher://gopher.floodgap.com:70/1/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80
Name: 00-INDEX.TXT
Title: /archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT
Date: 2010-Apr-23 11:12
Timestamp: 1272021150
Size: 892

MYZ80111.ZIP 105339 05-22-93 V1.11 Of Simeon Cran's CP/M emulator for the
                               | PC. This is Simeon Crans' complete CP/M
                               | package for the PC. It needs a 286 (or
                               | better) to run and is packed with goodies,
| such as the ability to run CP/M 2.2 or 3.0,
                               | 32-bit processor aware, multitasker aware,
                               | ADM3A/Televideo emulation, complete key
| re-mapping, etc etc. You've tried the rest,
                               | now try the BEST!! I haven't seen a better




- Kim

_______________________________________________
Gopher-Project mailing list
Gopher-Project@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/gopher-project




Reply to: