Re: IMAP server to fit this bill?

To: debian-user@lists.debian.org
Subject: Re: IMAP server to fit this bill?
From: Steve Lamb <grey@dmiyu.org>
Date: Fri, 19 Mar 2004 11:22:47 -0800
Message-id: <[🔎] 405B4887.2010904@dmiyu.org>
In-reply-to: <[🔎] 20040319170354.GW5671@rudedog.org>
References: <[🔎] 405A55E5.7000202@dmiyu.org> <[🔎] 20040319154139.GU5671@rudedog.org> <[🔎] 405B19E3.4000209@dmiyu.org> <[🔎] 20040319170354.GW5671@rudedog.org>

Dave Carrigan wrote:

As for putting extra headers into a message, I'm not sure why you think
this is a problem. That's what headers are for -- to convey
meta-information about a message.

Because forwarded messages are not the same as the original message. Ifthe person forwards it as a MIME attachment, for example, the (re)learnedmessage contains a different set of headers completely as well as a slew ofunrelated MIME encapsulation data. If it is bounced properly the bounceheaders are learned as either ham or spam. This extranious information canlead to false positives or negatives.

How, you ask? Well it has been my experience that the vast majority ofmail that would need to be retrained is missed spam. This is on the order ofseveral magnitudes. This means depending on the method used and how thetokenizer breaks down the message it is possible that the Bayesian classifierwill learn that "messages forwarded/bounced from one account to another in thedomain foo.com is spam" is far more likely than a balance of those headersbetween the ham and spam corpus keeping them in, at best, the undefinedcatagory. The end result is, for example, with enough training if I forwardedmail to my fiancee it could be tagged as spam and she'd never get it.

It introduces statistics which are meaningless in the final analysis.

Not sure what this means.

What this means is that even if the ham and spam corpus got the samenumver of meaningless statistics to render forwarded/bounces messageheaders/data as "undefined" and therefore not used it is still data that isbeing taken up in the classifier's DB. From what I have seen most classifierslimit the number of entries in their DB with the little used entries fallingof far before the common entries. By having known useless entries marked asham, spam or undefined (such as forwarding messages around the same machine)it becomes a well used entry that never drops off that adds nothing new or, atworse, can contribute to false positives.

Is there a particular reason that you need SMTP scanning?

   I do not believe it is right to accept and then silently drop

messages.

I never reject or discard messages, and my logs show exactly where every
message was delivered, down to the final mailbox. There is a possibility
that I might not see a message, but that doesn't mean it didn't get
delivered.

And those logs are accessable to all your users? Remember, this isn'tfor just me. This is for me and the family of mine that have chosen to hostsome of their mail on my machine because of the spam prevention I offer them.They simply do not want to download the spam at all. So if someone sendsthem a messag which is tagged as spam there are only two options for it.Either it is rejected at SMTP so the other side can see from theirlogs/bounces that it is rejected or it must be tagged and delivered. Any spamover a certain score (8) is rejected. Anything between two scores (5 and 8)is tagged and delivered. I cannot, however, tag all spam and then deliver itbecause that serves no purpose at all. Granted they can filter on their endbut the whole point is that they don't download it. Filtering comes afterdownloading. The 5-8 range allows for a nominal margin of error whilerejected the most obvious cases outright. So what you're saying is that youaccept all, tag and deliver anyway? Ew.

I'm using mutt and I'm using cyrus, so I'm not sure what this means. If
you're implying that you can't read your mail without an imap client,
then I'll concede that. Big deal. For me, the benefits of imap far
outweigh the disadvantage that there may be some mail clients that I
can't use.

I did mention elmo as well. I am not familiar with mutt's IMAPimplementation and I'd be willing to wager that it isn't up to par given thepreponderance of things mutt does wrong as well as how often most clients getIMAP wrong.


--
         Steve C. Lamb         | I'm your priest, I'm your shrink, I'm your
       PGP Key: 8B6E99C5       | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Re: IMAP server to fit this bill?
  - From: Dave Carrigan <dave@rudedog.org>

References:
- IMAP server to fit this bill?
  - From: Steve Lamb <grey@dmiyu.org>
- Re: IMAP server to fit this bill?
  - From: Dave Carrigan <dave@rudedog.org>
- Re: IMAP server to fit this bill?
  - From: Steve Lamb <grey@dmiyu.org>
- Re: IMAP server to fit this bill?
  - From: Dave Carrigan <dave@rudedog.org>

Prev by Date: libc6 - initscripts conflict
Next by Date: Re: mozill hangs due to realplayer 8
Previous by thread: Re: IMAP server to fit this bill?
Next by thread: Re: IMAP server to fit this bill?
Index(es):
- Date
- Thread