[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Every spam is sacred



On Thu, 12 Jun 2003, Duncan Findlay wrote:
> On Thu, Jun 12, 2003 at 08:00:58PM +0200, Santiago Vila wrote:
> > I agree every developer should be able to be on this variable by
> > request, but if we were able to block 50% or 60% of spam by a very
> > simple method and with very very few false positives, it would be
> > stupid not to estimate how many false positives there would really be
> > and we will never know whether or not there will be many, few or no
> > false positives until we try it in warn mode.
>
> I question your statistics.

This is the data I have:

list.dsbl.org    252 mails   63.5%
sbl.spamhaus.org  36 mails    9.1%

from a total of 397 spam messages received in May.

The DNS lookups were made after I received all the spam. I agree this
is not as accurate as it might be, but on the other hand, the data was
made from real spam sent to a real @debian.org address.

As an estimation, "half of the spam" should be a good one.

> As some of you may know, I'm involved in the upstream development of
> SpamAssassin. We have tested various RBLs and I agree that
> lists.dsbl.org is one of the best RBLs out there. sbl.spamhaus.org
> is not a great RBL.

I'm happy to hear that not all DNSBLs are evil :-)

> This is an excerpt from our test results. (S/O = spam/overall, rank
> and score are relatively meaningless)
>
> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>   92163    48993    43170    0.532   0.00    0.00  (all messages)
> 100.000  53.1591  46.8409    0.532   0.00    0.00  (all messages as %)
>  20.883  39.2668   0.0185    1.000   0.98    2.23  RCVD_IN_DSBL
>   6.917  12.6937   0.3614    0.972   0.87    0.56  RCVD_IN_SBL
>
> What this means:
> lists.dsbl.org hits roughly 39% of spam at the expense of 0.019% of ham
> sbl.spamhaus.org hits 13% of spam at the expense of 0.36% ham
>
> Using both would likely hit about 45% of spam at the expense of 0.4%
> of ham. Not 50-60%... and not with "very very few false positives".

Why 45%? Do they overlap so much?

BTW: According to these Spamassassin tests, are there any good DNSBL
which performs as well as the DSBL in terms of avoding false positives?

> And in case you're wondering about the accuracy of the data... I can
> assure you that it's quite good. (Using only recent mail, carefully
> classified, etc)

Probably not directed to @debian.org addresses. Your mileage may vary.

> I am not averse to using lists.dsbl.org on Debian machines providing
> there's an easy way to opt out. However, using sbl.spamhaus.org is not
> a good idea.

We could use it in warning mode and look for false positives anyway.

It's entirely possible that the false positive rate for mail directed
to @debian.org addresses is different from the tests performed by SA.

It could be higher, or it could be lower, but we will never know for
sure unless we test it in warn mode.

> I still don't think Debian should enforce a filtering policy on
> developers e-mail address. It should really be done on an individual
> basis.

recipients_reject_except may be done on an individual basis.

I still think that not rejecting obvious spam (open relays, open
proxies, etc.) is not a good "default".



Reply to: