[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: newer generation of spam



On Sat, May 24, 2003 at 01:18:58AM -0400, kenneth dombrowski wrote:
> 
> Hi Graeme,
> 
> You mean like this? 

[sample spam snipped... just in case someone would bounce emails containing
obfuscating comments from this list]

> I've noticed that too. Spamassassin on my Woody server catches it, but
> it's not the official Woody version (I think 2.20 is in Woody?). It's
> not the latest spamassassin either, I've read here that the 2.5 release
> has a bayesian system, but since I only get like one piece of spam in my
> inbox each month, I haven't bothered upgrading from 2.43. I never
> changed the defaults & using the OBFUSCATING_COMMENT technique scores 
> them a 2.1 on the spam-meter (see below) 

Yes, the OBFUSCATING_COMMENT technique looks appropriate here. Looking
at the header of your email, it appears that the debian-user
spamassassin (version=2.53-lists.debian.org_2003_04_28) does not use the
OBFUSCATING_COMMENT test. (The spamassassin 2.55 docs do not mention
this test.) Also, your sample spam would not be classified
as spam without this test (hits=4.9 required=5.0).

> You think Bogofilter is allowing that to obfuscate the meaning of the
> text? I would find that to be fairly surprising considering how long
> spam has been HTMLized (I'm thinking of <font size=6
> color=red>F</font><font size=4 color=blue>irst letter is big and 
> red</font> kind of stuff)

Yes, bogofilter 0.7.5+1017cvs-1 gets fooled (see below). At the time (Nov 6,
2002), it was the current testing package, and I was able to compile it
on my Woody/stable box. I can't do that with the current testing package
(0.12.3-1), because I get stuck in dependency hell with the libc6
transition.

Here's the output of bogofilter -vv on your sample spam:

  X-Bogosity: No, tests=bogofilter, spamicity=0.000497, version=0.7.5.1

  #  0.175014  0.175014  line
  #  0.313187  0.088204  our
  #  0.400000  0.060584  cribed
  #  0.400000  0.041222  e3mo3hg3
  #  0.400000  0.027864  ggyu9229gx1
  #  0.400000  0.018750  icat
  #  0.400000  0.012579  med
  #  0.400000  0.008421  pped
  #  0.400000  0.005630  pres
  #  0.400000  0.003760  shi
  #  0.400000  0.002510  skks8d369ngc
  #  0.400000  0.001675  thipfx3czlak
  #  0.400000  0.001117  x2xeh3sq3jzlj
  #  0.400000  0.000745  xhn6t43gl480
  #  0.400000  0.000497  xsw3j1y5i2

Look how it changes if I shift the obfuscating comments to word
boundaries:

  X-Bogosity: No, tests=bogofilter, spamicity=0.849039, version=0.7.5.1

  #  0.277706  0.277706  online
  #  0.324324  0.155797  get
  #  0.372829  0.098862  shipped
  #  0.400000  0.068154  e3mo3hg3
  #  0.400000  0.046492  ggyu9229gx1
  #  0.400000  0.031483  hoqjq82piiydf
  #  0.400000  0.021211  skks8d369ngc
  #  0.400000  0.014241  thipfx3czlak
  #  0.400000  0.009540  ww6dzu382t
  #  0.400000  0.006380  x2xeh3sq3jzlj
  #  0.400000  0.004262  xhn6t43gl480
  #  0.400000  0.002846  xsw3j1y5i2
  #  0.720358  0.007298  medication
  #  0.885426  0.053756  prescribed
  #  0.990000  0.849039  overnight

> (I wonder how many bogofilter users' filters will filter this message
> vs. the spamassassin users)

I don't filter this list, because the list filter already does, and I
want to be able to participate in discussions about spam. 

Kenneth, thanks for the short intro to spamassassin.

Looking at the bogofilter and spamassassin homepages, it appears that
both testing packages could handle obfuscating html comments. I couldn't
easily find documentation on stable spamassassin 2.2x, so I don't know
if the stable spamassassin would work ...

time to find out.

Graeme



Reply to: