[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SpamAssassin rules problems (was Re: SpamBouncer)



On Fri, Feb 08, 2002 at 12:48:55AM -0800, Blars Blarson wrote:
| In article <[🔎] 20020208022035.GB1164@dman.ddts.net> dsh8290@rit.edu writes:
| [spamassassin]
| >| The default rule scoring seems pretty far off to me though.
| >Can you expand on this?
| 
| (These comments are based on the few dozen mainly spam messages I've fed
| to "spamassassin -t", and some reading of the spamassassin mailing list
| archives.)
| 
| Low scores for some obvious spam-only indicators (javascript -- no
| valid mail will ever contain javascript)

That's a good point.

| Any html is a strong spam indicator.

Depends on the user.  Some groups of people tend to use html or both.
(not that I condone it)

| High scores for some things that could easily be tripped by valid email.
| (common spam phrases)

The spam phrases were messed up in 2.01.  A typo (or a thinko) in one
of the arithmatic expressions.
 
| Negative score for long messages.  Long messages are more likely to be
| spam, not less.

Depends on the context of the message.  Some people will write a lot.
A newsletter could be long.  Including lots of log messages or system
details can be long.

| The current auto-whitelist implementation seems to have some problems.

Yep.  It's going to be fixed in 2.02.  (that's what's holding back the
release)

| I haven't yet figured out how to configure which DNSBLs are used.

I don't know how easy that is.  In the config file it shows that a
perl function is called to test those.  Perhaps a perl function needs
to be made for each one?
 
| It only seems to catch about 60% of the spam that gets past my other
| filters.  (ordb, osirusoft, blarsbl, valid rDNS of relay, valid domain
| in envelope from) (These catch about 90% of the spam, and an occasional
| valid email.)
| 
| I think most of these problems stem from their mail base their scores
| are based on being very different from the mail I receive.

This is the likeliest cause of your problems with it.  If you can
build your own corpus then you can run the GA yourself and get default
scores tailored for your mail usage.  One issue the developers face is
coming up with a corpus and scores that work for everyone.  For geeks,
any message talking about making money is probably spam.  For
management of a company (whose mail admin runs SA), newsletters, etc,
discussing markets and money making are desired, rather than spam.

About the only thing I can suggest for you, if you really want to give
SA another chance, is to customize the scores for your usage.

-D

-- 

An anxious heart weighs a man down,
but a kind word cheers him up.
        Proverbs 12:25



Reply to: