[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: swamp rat bots Q



On Friday 04 December 2020 17:37:02 Tixy wrote:

> On Fri, 2020-12-04 at 17:06 -0500, Gene Heskett wrote:
> > On Friday 04 December 2020 16:14:29 Tixy wrote:
> > > On Fri, 2020-12-04 at 14:51 -0500, Gene Heskett wrote:
> > > > On Friday 04 December 2020 12:39:24 Reco wrote:
> > > > >       Hi.
> > > > >
> > > > > On Fri, Dec 04, 2020 at 08:39:42AM -0500, Gene Heskett wrote:
> > > > > > But I asked specifically how to enable it for one bot, and
> >
> > I've
> >
> > > > > > asked that question several times, getting smoke and mirror
> > > > > > answers you all assume are helpfull, but which are useless
> > > > > > to
> >
> > a
> >
> > > > > > new user installing the now 7 years old and long out of date
> > > > > > package that in effect has no "how it works" docs. I asked 3
> > > > > > questions in a previous day or so timeline, and no one has
> > > > > > actually attempted to actually answer even one of them. Here
> >
> > is
> >
> > > > > > one line from that log: and that I just blocked:
> > > > > >
> > > > > > coyote.coyote.den:80 192.99.6.226 - -
> > > > > > [04/Dec/2020:07:18:20 -0500] "GET
> > > > > > /gene/toolshed/c3/build/win32/prep/?C=S;O=D HTTP/1.1" 200
> > > > > > 673 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8;
> > > > > > http://mj12bot.com/)"
> > > > >
> > > > > Taken directly from the link.
> > > > >
> > > > > Bot Type         Good crawler (always identifies itself)
> > > > > IP Range         Distributed, Worldwide
> > > > > Obeys Robots.txt *Yes*
> > > >
> > > > Sorry, they do not, they've read it and ignored it 428 times in
> >
> > the
> >
> > > > life of that log which I zeroed out around 1 July of this year.
> > >
> > > Why would they read it if they we're going to just ignore it,
> >
> > perhaps
> >
> > > your robots.txt is broken? Hint, it is, in 2 or 3 different ways I
> >
> > can
> >
> > > see (if it's http://geneslinuxbox.net:6309/robots.txt we're
> > > talking about). That file doesn't have any syntactically correct
> > > entry in there for blocking that bot.
> >
> > And what might that be like, I'll fix it right now
>
> OK, I'll do your proofreading...
>
> At the end of the robots.txt you are missing a colon from a rule that
> disallows everything for all bots...
>
> User-agent *
> Disallow: /
>
> That should be:
>
> User-agent: *
> Disallow: /
>
> But if you just want to disable the bot you reckon is a problem, the
> front page of their site (https://mj12bot.com/) says you want:
>
> User-agent: MJ12bot
> Disallow: /
>
> Or you could read their page to see the robots.txt syntax for slowing
> down crawling, which I assume is relevant to other bots to you may
> have problems with.
>
> The other rules above your disallow everything (which are superfluous
> if you keep that final rule) also have typos, you have a '0' here...
>
> User-0agent: *
> Disallow: /doc/
>
> And this rule has a space in the URL...
>
> User-agent: *
> Disallow: stress test
>
> I'm pretty sure URLs can't have actual space characters in them and
> that must be a typo on your behalf. Also something I read when looking
> at this issue a few hours ago (but can't find again) reckoned that
> Google's bot let you have multiple statements on a line separated by
> spaces, e.g.
>
Fat fingers syndrome, I've suffered from that for 86 years. Short fat 
fingers that are fond of pressing 2 keys at once. Thanks for pointing it 
out nicely. The unreal part is that I have made an excellent living 
since I was about 14 and quit school to go fix them new-fangled things 
called Televisions in '48 when the first tv station came on the air in 
central Iowa. I wound up as the Chief Engineer at a string of tv 
stations from the early '70's on. But now I'm 86, eating my own cooking 
& trying to keep a 30% pump with some replacement parts running well 
enough to wake up the next morning. And building my own CNC machinery to 
keep me busy and out of the bars. Not much I haven't tried since.

My fingerprints have been to 37,000 feet deep in the mohole as I helped  
build the tv cameras that were on the Navy's Trieste in Feb 1960. They 
say water isn't compressible, but when the outside pressure is nearly 
18,000 psi, it sure is.

> Disallow: foo Disallow: bar
>
> So it seems likely that having a space in the URL like you have isn't
> legal, and could possibly upset parsing.

Anyway, I've fixed what you pointed out, and am watching the log.

Thank you, a lot. Stay safe and well, Tixy.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: