[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: IMAP server to fit this bill?



Dave Carrigan wrote:
As for putting extra headers into a message, I'm not sure why you think
this is a problem. That's what headers are for -- to convey
meta-information about a message.

Because forwarded messages are not the same as the original message. If the person forwards it as a MIME attachment, for example, the (re)learned message contains a different set of headers completely as well as a slew of unrelated MIME encapsulation data. If it is bounced properly the bounce headers are learned as either ham or spam. This extranious information can lead to false positives or negatives.

How, you ask? Well it has been my experience that the vast majority of mail that would need to be retrained is missed spam. This is on the order of several magnitudes. This means depending on the method used and how the tokenizer breaks down the message it is possible that the Bayesian classifier will learn that "messages forwarded/bounced from one account to another in the domain foo.com is spam" is far more likely than a balance of those headers between the ham and spam corpus keeping them in, at best, the undefined catagory. The end result is, for example, with enough training if I forwarded mail to my fiancee it could be tagged as spam and she'd never get it.

It introduces statistics which are meaningless in the final analysis.

Not sure what this means.

What this means is that even if the ham and spam corpus got the same numver of meaningless statistics to render forwarded/bounces message headers/data as "undefined" and therefore not used it is still data that is being taken up in the classifier's DB. From what I have seen most classifiers limit the number of entries in their DB with the little used entries falling of far before the common entries. By having known useless entries marked as ham, spam or undefined (such as forwarding messages around the same machine) it becomes a well used entry that never drops off that adds nothing new or, at worse, can contribute to false positives.

Is there a particular reason that you need SMTP scanning?

   I do not believe it is right to accept and then silently drop
messages.

I never reject or discard messages, and my logs show exactly where every
message was delivered, down to the final mailbox. There is a possibility
that I might not see a message, but that doesn't mean it didn't get
delivered.

And those logs are accessable to all your users? Remember, this isn't for just me. This is for me and the family of mine that have chosen to host some of their mail on my machine because of the spam prevention I offer them. They simply do not want to download the spam at all. So if someone sends them a messag which is tagged as spam there are only two options for it. Either it is rejected at SMTP so the other side can see from their logs/bounces that it is rejected or it must be tagged and delivered. Any spam over a certain score (8) is rejected. Anything between two scores (5 and 8) is tagged and delivered. I cannot, however, tag all spam and then deliver it because that serves no purpose at all. Granted they can filter on their end but the whole point is that they don't download it. Filtering comes after downloading. The 5-8 range allows for a nominal margin of error while rejected the most obvious cases outright. So what you're saying is that you accept all, tag and deliver anyway? Ew.

I'm using mutt and I'm using cyrus, so I'm not sure what this means. If
you're implying that you can't read your mail without an imap client,
then I'll concede that. Big deal. For me, the benefits of imap far
outweigh the disadvantage that there may be some mail clients that I
can't use.

I did mention elmo as well. I am not familiar with mutt's IMAP implementation and I'd be willing to wager that it isn't up to par given the preponderance of things mutt does wrong as well as how often most clients get IMAP wrong.

--
         Steve C. Lamb         | I'm your priest, I'm your shrink, I'm your
       PGP Key: 8B6E99C5       | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: