* Thomas Lange <lange@cs.uni-koeln.de> [250723 10:56]:
I've prepared a list of user-agents bots when I've analysed the www.debian.org logs. It will not cover all, but most bots that send their user-agent string. I did not tried to exclude IP addresses, but there's a list of good bots: https://github.com/AnTheMaker/GoodBots This is my regex file for grep -vf to exclude some bots: Wget curl/ [..]
From what I've seen in other places, the AI scrapers send (semi-)real User-Agents, mimicing Chrome, Firefox, MSIE, etc.
Chris