Re: newer generation of spam
On Sat, May 24, 2003 at 01:18:58AM -0400, kenneth dombrowski wrote:
>
> Hi Graeme,
>
> You mean like this?
[sample spam snipped... just in case someone would bounce emails containing
obfuscating comments from this list]
> I've noticed that too. Spamassassin on my Woody server catches it, but
> it's not the official Woody version (I think 2.20 is in Woody?). It's
> not the latest spamassassin either, I've read here that the 2.5 release
> has a bayesian system, but since I only get like one piece of spam in my
> inbox each month, I haven't bothered upgrading from 2.43. I never
> changed the defaults & using the OBFUSCATING_COMMENT technique scores
> them a 2.1 on the spam-meter (see below)
Yes, the OBFUSCATING_COMMENT technique looks appropriate here. Looking
at the header of your email, it appears that the debian-user
spamassassin (version=2.53-lists.debian.org_2003_04_28) does not use the
OBFUSCATING_COMMENT test. (The spamassassin 2.55 docs do not mention
this test.) Also, your sample spam would not be classified
as spam without this test (hits=4.9 required=5.0).
> You think Bogofilter is allowing that to obfuscate the meaning of the
> text? I would find that to be fairly surprising considering how long
> spam has been HTMLized (I'm thinking of <font size=6
> color=red>F</font><font size=4 color=blue>irst letter is big and
> red</font> kind of stuff)
Yes, bogofilter 0.7.5+1017cvs-1 gets fooled (see below). At the time (Nov 6,
2002), it was the current testing package, and I was able to compile it
on my Woody/stable box. I can't do that with the current testing package
(0.12.3-1), because I get stuck in dependency hell with the libc6
transition.
Here's the output of bogofilter -vv on your sample spam:
X-Bogosity: No, tests=bogofilter, spamicity=0.000497, version=0.7.5.1
# 0.175014 0.175014 line
# 0.313187 0.088204 our
# 0.400000 0.060584 cribed
# 0.400000 0.041222 e3mo3hg3
# 0.400000 0.027864 ggyu9229gx1
# 0.400000 0.018750 icat
# 0.400000 0.012579 med
# 0.400000 0.008421 pped
# 0.400000 0.005630 pres
# 0.400000 0.003760 shi
# 0.400000 0.002510 skks8d369ngc
# 0.400000 0.001675 thipfx3czlak
# 0.400000 0.001117 x2xeh3sq3jzlj
# 0.400000 0.000745 xhn6t43gl480
# 0.400000 0.000497 xsw3j1y5i2
Look how it changes if I shift the obfuscating comments to word
boundaries:
X-Bogosity: No, tests=bogofilter, spamicity=0.849039, version=0.7.5.1
# 0.277706 0.277706 online
# 0.324324 0.155797 get
# 0.372829 0.098862 shipped
# 0.400000 0.068154 e3mo3hg3
# 0.400000 0.046492 ggyu9229gx1
# 0.400000 0.031483 hoqjq82piiydf
# 0.400000 0.021211 skks8d369ngc
# 0.400000 0.014241 thipfx3czlak
# 0.400000 0.009540 ww6dzu382t
# 0.400000 0.006380 x2xeh3sq3jzlj
# 0.400000 0.004262 xhn6t43gl480
# 0.400000 0.002846 xsw3j1y5i2
# 0.720358 0.007298 medication
# 0.885426 0.053756 prescribed
# 0.990000 0.849039 overnight
> (I wonder how many bogofilter users' filters will filter this message
> vs. the spamassassin users)
I don't filter this list, because the list filter already does, and I
want to be able to participate in discussions about spam.
Kenneth, thanks for the short intro to spamassassin.
Looking at the bogofilter and spamassassin homepages, it appears that
both testing packages could handle obfuscating html comments. I couldn't
easily find documentation on stable spamassassin 2.2x, so I don't know
if the stable spamassassin would work ...
time to find out.
Graeme
Reply to: