[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Recent page visit statistics



I've prepared a list of user-agents bots when I've
analysed the www.debian.org logs. It will not cover all, but most bots
that send their user-agent string. I did not tried to exclude IP addresses,
but there's a list of good bots: https://github.com/AnTheMaker/GoodBots

This is my regex file for grep -vf to exclude some bots:


Wget
curl/
Go-http-client
Blackbox Exporter
Zabbix
check_http/v
ahrefs.com/robot
Amazonbot
Applebot
archive.org_bot
AwarioBot
Baiduspider
bingbot
Bingbot
BLEXBot/
bytedance.com
ClaudeBot/
crawler
Crawler
crawling for movies
dataforseo-bot
Discordbot
DotBot
DuckDuckBot
Exabo
facebot
facebookexternalhit
Googlebot
GPTBot
ManageEngine Endpoint Central
mj12bot.com
MJ12bo
MojeekBot
my-tiny-bot
naver.me
petalbot
qwant.com/bot
Randomized
riddler.io
Scrapy
SemrushBot
SeobilityBot
serpstatbot
SeznamBot
Slurp
spider
Spider
test-bot
APT-CURL
ww.sogou.com
WWW-Mechanize
www.xforce-security.com
yacybot
/yacy.net
YandexBot
maubot/
ask.com/bot
LinkedInBot
Twitterbot
www.instagram.com
spbot.org
openlinkprofiler.org
FeverBot
NaverBot
GrapeshotCrawler
ImagesiftBot
BotPoke
Slackbot
facebookexternalhit
bot.seekport
EyeMonIT Uptime Bot
checkmk-active-http
YisouSpider
muety/website-watcher
conky-curl
-- 
regards Thomas


Reply to: