Analyzing exim logfiles
I'm trying to do some analysis of my exim logfiles and my spam logs (using
Spamassassin) and I'm confused by what exim logs when.
I run spamassassin site-wide through an exim transport. I have a site-wide
exim filter set up to look for the added X-spamassassin headers, and
depending on the spam score either dump the message to the bitbucket, or
into a folder for further review. The filter also writes a line to a
logfile indicating the date, time, spam score, sender, and recipients.
I've written a perl script to parse that file and it tells me, for example,
that there were 3208 spams caught last week, with each address in the
$recipients variable counting as one message (i.e., if the spam was sent in
a single SMTP transaction to three addresses at my domain, my script counts
it as three spams).
I'm trying to figure out what percentage of my incoming mail that is. If I
run eximstats against the corresponding mainlog, or count the number of
lines with "=>" or "<=", it shows much smaller numbers -- 1776 incoming,
and 1854 delivered. Even assuming that much spam is sent to multiple
addresses, 1776 in versus 3200 spams + some number of valid messages
doesn't seem to line up.
What I'm wondering is what defines an incoming message and what defines a
delivered message with respect to multiple addressees, and how messages
caught by the spamassassin transport are logged versus those actually
delivered. My goal is to know that X messages were received for Y total
recipients, to compare with the Q spams sent to R total recipients. (By
the way, I need to count as "delivered" both those messages for my local
domain and those that are aliased either to mailing lists or to external
domains. And, I want to exclude local system-generated messages, but I can
handle that separately.)
Any help in figuring out how to match up these stats would be appreciated.