[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: packages with invalid maintainer fields



Thomas Bushnell BSG <tb@becket.net> writes:

> This is not universal, but it is extremely common.  Even someone as
> patient and decent as you doesn't give a hint in your email here that if
> your rule is over-broad and drops a valid message, you would regard that
> as something you must fix, or apologize for, or anything other than
> "this is the price that (other people) have to pay."

If I had a spam filtering rule that was over-broad and caught a message
that wasn't spam, I would feel bad about that, and about inconveniencing
the sender.  To the extent that I could, I would try to ensure that
similar non-spam messages were not caught by the spam filter in the
future.  One of the reasons why I use a statistical filter like bogofilter
is so that there is a well-defined way for me to do that through filter
training.

However, in some cases, it may not be possible to fix the problem.  The
person may have run afowl of some rule that Stanford had to apply
site-wide for some reason (such as to prevent DoS attacks on our mail
servers), or they may be triggering some rule with such a high weight and
such a proven track record at catching spam that the single data point is
not statistically significant.  In such cases, I would explain to the
other person that I consider the false positive to be a flaw in my spam
filtering method, but I can't find a way to fix the flaw that wouldn't
cause worse problems for me.

In other words, I think that spam filtering false positives are bugs, but
some bugs are wontfix.

On the specific case of sending mail directly from dialup IP addresses, I
would strongly recommend against ever doing this currently because, by
doing so, one is putting oneself in a statistical bucket that is
*overwhelmingly* spam.  To a first approximation, all mail direct from
dialups is spam.  I personally prefer scoring filters at ever level, but I
know some people who have simply banned all mail from dialups, and when
they show me the statistics they're dealing with, I can't help but admit
that it makes sense for them to do what they do.  This is particularly the
case for people who receive orders of magnitude more spam than my paltry
few thousand a day (and such people most certainly exist).

Being cautious about what statistical bucket one puts oneself into when
communicating has been standard advice on the Internet for decades.  Even
long before the advent of spam, it's always been the case that certain
ways of expressing oneself made it far more likely that people would
ignore one's messages (like writing in all caps, in a language the other
person didn't understand, or to inappropriate addresses).  In an ideal
world, everyone would listen to all communication in direct proportion to
the amount of useful content in that communication.  Alas, in the real
world, we all have limited time and have to optimize expenditure of that
time, and we tend to do that statistically by dropping whole classes of
communication that have a very bad signal to noise track records.

> So part of what's going on in the shift from by-hand false positives to
> automated false positives is a little bit of the old "blame it on the
> computer".

I do agree, and I think that's unfortunate.  I don't think that viewing
this as a question of fault is particularly useful.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>



Reply to: