Re: Download a whole gopherhole using wget/curl?

To: gopher-project@other.debian.org
Subject: Re: Download a whole gopherhole using wget/curl?
From: Sean Conner <sean@conman.org>
Date: Thu, 28 Nov 2019 05:31:34 -0500
Message-id: <[🔎] 20191128103133.GB30671@brevard.conman.org>
In-reply-to: <[🔎] 16eb08176ef.11eef69dc482927.3075840238943420488@kiwidev.xyz>
References: <[🔎] 16eb08176ef.11eef69dc482927.3075840238943420488@kiwidev.xyz>

It was thus said that the Great kiwidevelops once stated:
> Hi everyone,
> 
> I want to archive as many gopherholes as I can, just in case any of them
> one day shut down or their server stops running and would like to know how
> I can download a gopherhole recursively. 

  As I'm wont to do on the Gemini protocol mailing list, I often create
server content that makes a point [1].  I've done the same here, so show
that there are traps for the unaware.  If you attempt to crawl this link
[2]:

	gopher://gopher.conman.org/1BlackHole:

You'll enter into a space with an infinite number of pages.  That is, until
1) your system runs out of space, or (most likely) 2) the client errors
because it can't handle selectors beyond a certain size [3].  This is just
one example of dynamic content generation.  I don't think there's much of
this in gopher (excluding search output) but it does exist. [4]

  And as some others have pointed out, some gopherholes are rather large in
size.

> Does anyone know how to properly back up a whole gopherhole? Thank you!

  Ask the site owner politely for a copy of the content?

  -spc

[1]	Or tortures client programs, take your pick.

[2]	If your client attempts get get "/BlackHole:" it's doing things
	wrong.

[3]	And these selectors grow ... up to 16,400 or so bytes long.  RFC-1436 says
	nothing about the length of selectors.

[4]	My site:

		gopher://gopher.conman.org/

	is nearly entirely dynamically generated.  For instance:

		gopher://gopher.conman.org/1Bible:
		gopher://gopher.conman.org/0Bible:Genesis
		gopher://gopher.conman.org/0Bible:Genesis.1
		gopher://gopher.conman.org/0Bible:Genesis.1:1
		gopher://gopher.conman.org/0Bible:Genesis.1-3
		gopher://gopher.conman.org/0Bible:Genesis.1:1-3:24
		gopher://gopher.conman.org/0Bible:Genesis.2:5-17

	All of these are valid pages.  And given I have the entire King
	James Bible here, that's a ton of potential pages.

Reply to:

Follow-Ups:
- Re: Download a whole gopherhole using wget/curl?
  - From: Florian Teply <usenet@teply.info>

References:
- Download a whole gopherhole using wget/curl?
  - From: kiwidevelops <kiwidev@kiwidev.xyz>

Prev by Date: Re: Can anyone recommend gopher gateway software tha's nginx-friendly?
Next by Date: Re: Can anyone recommend gopher gateway software tha's nginx-friendly?
Previous by thread: Re: Download a whole gopherhole using wget/curl?
Next by thread: Re: Download a whole gopherhole using wget/curl?
Index(es):
- Date
- Thread