[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: automating sa-learn via cyrus mailbox?



On Sat, 28 Aug 2004 22:31:54 -0500, Will Trillich <will@serensoft.com> wrote:

> so why not create a "user.spam" cyrus mailbox, BOUNCE any spams
> there and have cron do some sort of automated "sa-learn --spam"
> on the results, and then delete them?

I do something similar, but currently only for one user on a local
machine. I save all _human confirmed_ spam to a folder (=spam) and
cron job a perl script which runs sa-learn then deletes the messages.

> anybody doing anything like this? got code i could sniff before
> i work up my own wheel from scratch?

Something simple like

my $output = `sa-learn --spam Mail/spam/cur/* Mail/spam/new/*`;
system "rm Mail/spam/cur/* Mail/spam/new/*";
print $output unless $output =~ /Learnt from \d+ messages (\d+
messages inspected)/;

where the regexp is from the top of my head but is to stop needless
cron emails. This wouldn't be necessary if
http://bugs.debian.org/268035 was implemented.

Please note I emphasise human-confirmed: any mail filtered by SA is
put in =filtered (or =filtered-7, in my case) and not automatically
sa-learn'd. If you got a single false-positive and you were feeding
SA's output into sa-learn, you'd strengthen the association which
caused the false positive.

I plan on scaling this up for multiple users by using IMAP. The user
can save the spam to their `spam' folder via IMAP; so there's no need
for them to use mutt or read mail locally, and none of the problems
faced with missing headers when forwarding etc. Failing that, an
address spam@the.domain to forward/bounce spam to (but of course, the
header problem will exist then)

Finally, the filtering works best if it sees a source of ham, too. So,
once I've got everything sorted, I'd like to automatically scan all
other folders (e.g. not =spam, =filtered...) and set those as ham.

-- 
Jon Dowland
dowland@gmail.com



Reply to: