[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: packages with invalid maintainer fields

Thomas Bushnell BSG <tb@becket.net> writes:
> Russ Allbery <rra@stanford.edu> writes:

>> I adopt no particular spam filtering rules at the SMTP layer, but I use
>> bogofilter (a Bayesian-trained spam filter) to pre-process my mail and
>> weed out the spam.  The chances of me noticing a false positive are
>> non-zero but fairly low.  It is plausible that some users trying to
>> contact me about my packages would have their mail filtered out and
>> thereby receive no response from me.  (It's unlikely, or I wouldn't use
>> this spam filtering method, but bogofilter is not immune to false
>> positives.)

> Do you drop the mail on the floor, give a connection-level error, or
> send a bounce?

For various reasons related to my mail filtering setup, I do not do spam
processing until after mail delivery, which means that the mail is
effectively dropped on the floor.  (Some of it really is; some of it goes
into a spam folder that I scan daily, but the chances of me missing a
false positive among the other 1,000 messages per day is high.)

Once the message has been accepted at the SMTP layer, it's very bad form
to generate a bounce message, since your server then hits other people
with virus backscatter.  I still have a few qmail servers in production
use that I need to find time to convert to Postfix, but I try pretty hard
to avoid creating any new backscatter problems and /dev/null frequently
spammed addresses in qmail until I have a chance to do that conversion.

One of the things that means is that unless you're the original accepting
mail server, you generally should not be bouncing the mail.  If I were
going to bounce the spam, I would have to do it at the stanford.edu mail
servers, not at the final destination mail server.  I doubt Stanford would
be particularly pleased with me embedding my spam detection logic into our
main relay systems.

Please don't underestimate the problems caused by backscatter.  Nearly all
of my current virus load is backscatter traffic, and each time a new virus
comes out I tend to get a hundred copies and several hundred to a thousand
bounce messages claiming I sent it.

> Really, this isn't mindless on my part: I've been stuck before with this
> problem, and I don't ever want to be stuck again, because some mail
> server has an over eager rule which prevents them from even hearing
> about the problem while it's going on.

I understand your feelings, but the only result you're going to get from
forbidding Debian developers to use many widely used and effective forms
of spam filtering is going to be a sudden lack of Debian developers.  I
will respectfully explain my spam filtering rules to anyone who asks, and
I will try to minimize false positives, but should Debian actually demand
that I go back to dealing with over 5,000 spam messages per day via manual
sorting (and it's probably doubled at this point) in order to participate
in the project, I'd probably have to just find some other project to
contribute to.

> The specific cases *I'm* bothered by are methods which prevent all mail,
> of whatever content, from arriving from particular validly-configured
> hosts.

One component of the rules applied by Stanford's spam filtering setup,
which my bogofilter setup takes as partial input, are host-based origin
rules.  The reason why such rules are used is because they are *extremely*
effective (high spam catch rate, very low false positive rate).

Spam filtering is a statistical operation.  I'm not going to stop using
rules that I know are effective via hard statistical evidence for reasons
not backed up by similar hard evidence.  Plus, I really don't think
there's any way that one can achieve, let alone mandate, zero false
positives -- as previously mentioned, purely human spam filtering does not
have zero false positives.

There are some filtering techniques that I personally consider too
aggressive, but I think you're going to be hard-pressed to build any sort
of objective criteria that can actually be measured at a distance.  The
resulting fight over applying any rule like this would, I think, hurt the
project considerably more than occasionally losing mail from users.
Frankly, anyone who uses e-mail these days is used to mail going
occasionally missing; I know average users, not technical experts, who use
spam rejection rules so harsh that I would never consider them.

Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

Reply to: