Re: Web-bot tarpit aka spider trap (was: swamp rat bots Q)
On Friday 04 December 2020 09:00:05 Nicolas George wrote:
> hdv@gmail (12020-12-04):
> > Let me offer you an alternative option. (Most) bots work by
> > analysing the referrals on each page in your website. Right? So, why
> > not add a link to a page that normal users will never visit (e.g.
> > because they do not see the link and thus will never click on it),
> > but will show up in a bot's analysis? That way you can monitor your
> > logs for entries containing that page. Every entity requesting that
> > specific URL is blocked.
> This made me think of something.
> A long time ago, a friend of mine implemented, to trap the
> badly-behaved robots, something called the Book of Infinity: a set of
> deterministic pseud-random pages linking to sub-pages ad infinitum,
> with ever growing URLs.
> As it happened, it was not actually a good idea, and released a lot of
> CO₂, and the very badly behaved robots had to be blacklisted from
> explring it. (At some point, we had the same problem when Googlebot
> tried to brute-force our online make-your-own-adventure book, but
> Googlebot heeds robots.txt.)
> But it could be coupled with techniques inspired by spam tarpits: have
> the server reply at a crawl to force the bots to waste resources,
> while keeping the resource consumption on the server strictly bounded.
> Oh, I just noticed I was not the first one to think of it: Wikipedia
> tells me it's called a spider trap.
Sounds like a good idea, I'll have to think about it, feed the bots in
256 byte pieces every 5 seconds to keep them from timing out, with 256
bytes from rnd mixed in to make a dos packet? :) Just be sure the crc is
Take care now, Nicolas.
Cheers, Gene Heskett
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>