[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Web-bot tarpit aka spider trap (was: swamp rat bots Q)



hdv@gmail (12020-12-04):
> Let me offer you an alternative option. (Most) bots work by analysing the
> referrals on each page in your website. Right? So, why not add a link to a
> page that normal users will never visit (e.g. because they do not see the
> link and thus will never click on it), but will show up in a bot's analysis?
> That way you can monitor your logs for entries containing that page. Every
> entity requesting that specific URL is blocked.

This made me think of something.

A long time ago, a friend of mine implemented, to trap the badly-behaved
robots, something called the Book of Infinity: a set of deterministic
pseud-random pages linking to sub-pages ad infinitum, with ever growing
URLs.

As it happened, it was not actually a good idea, and released a lot of
CO₂, and the very badly behaved robots had to be blacklisted from
explring it. (At some point, we had the same problem when Googlebot
tried to brute-force our online make-your-own-adventure book, but
Googlebot heeds robots.txt.)

But it could be coupled with techniques inspired by spam tarpits: have
the server reply at a crawl to force the bots to waste resources, while
keeping the resource consumption on the server strictly bounded.

Oh, I just noticed I was not the first one to think of it: Wikipedia
tells me it's called a spider trap.

https://en.wikipedia.org/wiki/Spider_trap

Regards,

-- 
  Nicolas George

Attachment: signature.asc
Description: PGP signature


Reply to: