[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: swamp rat bots Q



On Friday 04 December 2020 03:49:39 hdv@gmail wrote:

> On 2020-12-03 13:35, Gene Heskett wrote:
> > I've had it with a certain bot that that ignore my robots.txt and
> > proceeds to mirror my site, several times a day, burning up my
> > upload bandwidth. They've moved it to 5 different addresses since
> > midnight.
> >
> > I want to nail the door shut on the first attempted access by these
> > AH's.
> >
> > Does anyone have a ready made script that can watch my httpd "other"
> > log, and if a certain name is at the end of the line, grabs the ipv4
> > src address as arg3 of the line, and applies it to iptables DROP
> > rules?
> >
> > Or do I have to invent a new wheel for this?
> >
> > Basic rules that simplify it somewhat.
> >
> > 1. this is ipv4 only country and not likely to change in the future
> > decade.
> >
> > 2. the list of offending bot names will probably never go beyond 50,
> > if that many. 5 would be realistic.
> >
> > 3. the src address in the log is at a fixed offset, obtainable with
> > the bash MID$ but the dns return will need some acrobatics involving
> > the bash RIGHT$ function.
> >
> > 4. it should track the number of hits, and after so many in a /24
> > block, autoswitch to a /16 block in order to keep the rules file
> > from exploding.
> >
> > Any help will be much appreciated. PM's in this case welcome as I
> > can't see broadcasting our armament against these MF'ers being
> > broadcast on a public list.
> >
> > Thanks all.
> >
> > Cheers, Gene Heskett
>
> Let me offer you an alternative option. (Most) bots work by analysing
> the referrals on each page in your website. Right? So, why not add a
> link to a page that normal users will never visit (e.g. because they
> do not see the link and thus will never click on it), but will show up
> in a bot's analysis? That way you can monitor your logs for entries
> containing that page. Every entity requesting that specific URL is
> blocked.
>
Now that idea has some merit. Some of the bots are asking for stuff I've 
deleted years ago. And which cannot be obtained by links that now exist 
So I am wondering how they do it as they do not now exist. If I tracked 
and recorded those the list would be a long one.

But I asked specifically how to enable it for one bot, and I've asked 
that question several times, getting smoke and mirror answers you all 
assume are helpfull, but which are useless to a new user installing the 
now 7 years old and long out of date package that in effect has no "how 
it works" docs. I asked 3 questions in a previous day or so timeline, 
and no one has actually attempted to actually answer even one of them.
Here is one line from that log: and that I just blocked:

coyote.coyote.den:80 192.99.6.226 - - 
[04/Dec/2020:07:18:20 -0500] "GET /gene/toolshed/c3/build/win32/prep/?C=S;O=D 
HTTP/1.1" 200 673 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; 
http://mj12bot.com/)"

That named file does exist, but since its many years old & outdated, 
related to a computer that is itself nearly 38 years old, and is out of 
date as its been 4 or 5 years since I did an hg pull and rebuilt it. My 
own version of that machine has died of dried out electrolytic 
capacitors and no longer boots its baby unix os. I am a CET, have been 
since 1972 so I could fix it, but there comes a time when its time to 
let it go. I am now building metal carving machines run by LinuxCNC.

> HTH
>
> HdV


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: