Web-bot tarpit aka spider trap (was: swamp rat bots Q)

To: debian-user@lists.debian.org
Subject: Web-bot tarpit aka spider trap (was: swamp rat bots Q)
From: Nicolas George <george@nsup.org>
Date: Fri, 4 Dec 2020 15:00:05 +0100
Message-id: <[🔎] 20201204140005.7syb2rfjxv63dwzx@phare.normalesup.org>
Reply-to: debian-user@lists.debian.org
In-reply-to: <[🔎] a499d78b-1f02-c33c-7b24-69fcfa419153@gmail.com>
References: <[🔎] 202012030735.27875.gheskett@shentel.net> <[🔎] a499d78b-1f02-c33c-7b24-69fcfa419153@gmail.com>

hdv@gmail (12020-12-04):
> Let me offer you an alternative option. (Most) bots work by analysing the
> referrals on each page in your website. Right? So, why not add a link to a
> page that normal users will never visit (e.g. because they do not see the
> link and thus will never click on it), but will show up in a bot's analysis?
> That way you can monitor your logs for entries containing that page. Every
> entity requesting that specific URL is blocked.

This made me think of something.

A long time ago, a friend of mine implemented, to trap the badly-behaved
robots, something called the Book of Infinity: a set of deterministic
pseud-random pages linking to sub-pages ad infinitum, with ever growing
URLs.

As it happened, it was not actually a good idea, and released a lot of
CO₂, and the very badly behaved robots had to be blacklisted from
explring it. (At some point, we had the same problem when Googlebot
tried to brute-force our online make-your-own-adventure book, but
Googlebot heeds robots.txt.)

But it could be coupled with techniques inspired by spam tarpits: have
the server reply at a crawl to force the bots to waste resources, while
keeping the resource consumption on the server strictly bounded.

Oh, I just noticed I was not the first one to think of it: Wikipedia
tells me it's called a spider trap.

https://en.wikipedia.org/wiki/Spider_trap

Regards,

-- 
  Nicolas George

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: Web-bot tarpit aka spider trap (was: swamp rat bots Q)
  - From: Gene Heskett <gheskett@shentel.net>

References:
- swamp rat bots Q
  - From: Gene Heskett <gheskett@shentel.net>
- Re: swamp rat bots Q
  - From: "hdv@gmail" <hdv.jadev@gmail.com>

Prev by Date: Where to report: root fails to edit other users file in sticky bit directory
Next by Date: Re: Where to report: root fails to edit other users file in sticky bit directory
Previous by thread: Re: swamp rat bots Q
Next by thread: Re: Web-bot tarpit aka spider trap (was: swamp rat bots Q)
Index(es):
- Date
- Thread