Re: Spam in bugs database - automatic removal?
On Thu, 13 Feb 2003, H. S. Teoh wrote:
> On Thu, Feb 13, 2003 at 05:47:55PM +0000, Colin Watson wrote:
> > bugs.debian.org has certain advantages here in that valid mail tends to
> > follow fixed formats. We've scored down /^Package:/ and X-Debbugs-Cc:,
> > and Bug# is already scored down by SpamAssassin.
> Unfortunately, some spammers appeared to have stumbled about putting
> "bug#nnnn" in the message body. Did you see the one that came to -devel
> yesterday? It had a couple o' bug numbers on it, and seems to have escaped
> -devel's spamfilter because the BTS bounced it.
Yes, I'm sure they saw it. No, the messages didn't have "bug#nnn" in
the body, they only had it in the subject line. This is added by the
BTS for outgoing messages, after the message is checked with
spamassassin, I assume.
> Also, keep in mind that bugs submitted against SA itself may contain spam
> attachments (eg. a real spam mail to demonstrate an SA bug). Or does the
> current setup ignore attachments?
False positives are not such a big problem in the BTS because they can
reintroduce a message back in the system if needed.
The point is: Every time there is a spam message they have to clean it,
and every time there is a false positive they have to reintroduce it
in the system. If, for example, 95% of the messages received by the
BTS having a spamassassin score between 4.0 and 5.0 are spam, the
logical thing to do for the BTS people is to reduce the threshold so
that they have a little bit more of work putting false positives back
in the system and a lot less of work cleaning the system, reducing
the overall work required to keep the BTS clean.