Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)

To: gopher-project@other.debian.org
Subject: Re: robots.txt (was Re: Download a whole gopherhole using wget/curl?)
From: Florian Teply <usenet@teply.info>
Date: Tue, 3 Dec 2019 06:43:09 +0100
Message-id: <[🔎] 20191203064309.3adacf2e@aluminium.mobile.teply.info>
In-reply-to: <CALGqR9LAirDQ5zBoWH90PN00NFF6BQWo6gC-=4RHk8cWunjzGQ@mail.gmail.com>
References: <16eb08176ef.11eef69dc482927.3075840238943420488@kiwidev.xyz> <20191128103133.GB30671@brevard.conman.org> <20191128182223.0117879b@aluminium.mobile.teply.info> <20191128174525.11D43101923F3@r-36.net> <20191128223304.GC30671@brevard.conman.org> <20191129052905.53DDB101923F1@r-36.net> <20191129053934.GE30671@brevard.conman.org> <CALGqR9LAirDQ5zBoWH90PN00NFF6BQWo6gC-=4RHk8cWunjzGQ@mail.gmail.com>

Am Fri, 29 Nov 2019 22:41:12 +1000
schrieb James Mills <prologic@shortcircuit.net.au>:

> Silly question; But isn't the User-Agent kind of useless here since a
> Gopher request is basically just a selector for a resource?
> There are no headers
> No User-Agent to identify a request
> 
> What am I missing here :)
> 
Well the point is that there's nothing to identify server-side as
robots.txt is more or less just an indication from the server which
parts of the site should not be crawled. Parsing and interpretation is
done entirely on the client side. So if a client finds an entry which
it believes to fit to itself, it should act accordingly.

HTH,
Florian
> Kind Regards
> 
> James
> 
> James Mills / prologic
> 
> E: prologic@shortcircuit.net.au
> W: prologic.shortcircuit.net.au
> 
> 
> On Fri, Nov 29, 2019 at 3:39 PM Sean Conner <sean@conman.org> wrote:
> 
> > It was thus said that the Great Christoph Lohmann once stated:  
> > > Good point. In eomyidae you have two possibilities:
> > >
> > >       User-Agent: *
> > >       Disallow: *  
> >
> >   Okay, but this diverts from the HTTP version of robots.txt (from
> > my understanding unless it's been updated since I was last dealing
> > with this stuff).
> >  
> > > and
> > >
> > >       User-Agent: *
> > >       Disallow:  
> >
> >   This actually has a different meaning from the HTTP
> > version---there this means "all browsers allowed to crawl" (back
> > from when it robots.txt was first developed).
> >
> >   -spc
> >
> >

Reply to:

Next by Date: Re: New gopherhole
Next by thread: Re: New gopherhole
Index(es):
- Date
- Thread