Re: More on spam
On Sun, Oct 19, 2003 at 01:35:57PM +0100, Karsten M. Self wrote:
> on Fri, Oct 17, 2003 at 02:23:39PM +0100, Colin Watson (firstname.lastname@example.org) wrote:
> > On Fri, Oct 17, 2003 at 05:56:26AM -0700, Paul Johnson wrote:
> > > On Fri, Oct 17, 2003 at 05:36:40AM -0700, Tom wrote:
> > > > What does this have to do with spam? It bemuses and befuddles me to
> > > > observe extremely intelligent people to swatting the air with tools
> > > > like spamassassin, when the correct solution lies elsewhere. The
> > > > correct solution is to merely enlighten all of humanity not to send
> > > > spam.
> > >
> > > Spamassassin is one of many tools to do this. Simply using
> > > spamassassin to delete your email is not going to get the job done.
> > > You have to follow through with other means to get the spammer's
> > > webhosts and email providers involved to cut them off.
> > If only this were practical for the volume of spam bugs.debian.org gets
> > (2Gb caught by spamassassin in the last two weeks). We just don't have
> > the manpower even to make a dent here. :-(
> My response then would be that throwing manpower at the problem is the
> wrong thing for a number of reasons:
> - Debian is a volunteer project. Manpower is always in short supply,
> and throwing it at this pulls it from other tasks.
> - Responding to spam isn't particularly fruitful. It doesn't leverage
> itself meaningfully.
> - There are other ways to reign in the problem and/or raise costs.
> I'd recommend the following approaches:
> - Keep stacking on the filters. Automated measures do seem to work,
> and can be leveraged -- *everyone* has a spam roblem.
> - Run and keep stats on spam. There are several dimensions which are
> interesting, among them:
> - Relative amounts of spam vs. ham.
> - Origins by nation.
> - Origins by network.
> - Origins by service classification (fixed IP, dynamic, DUL).
> - SA (or other classifier) scores on ham.
> - SA (or other classifier) scores on spam.
> - Top originating IPs for ham.
> - Top originating IPs for spam.
> - Frequency of occurence trends for ham. For a given mailserver,
> how many messages are received, say, weekly, classed as ham. I
> expect that a reasonably small number of servers will originate a
> large amount of mail, and a larger number will originate a smaller
> - Frequency of occurence trends for spam. For a given mailserver,
> how many messages are received, say, weekly, classed as spam. I
> expect that a small number (smaller than the first group above)
> will originate a moderate amount of spam, and that most spam will
> originate from previously unknown servers.
> - Spam/Ham mix by server. I suspect you can pretty much classify
> hosts as hammy or spammy, with some being moderately grey.
> - I see future directions in spam management being making mailservers
> much more intelligent about the hosts they receive mail from.
> Typically good hosts will get preferential treatment. Bad hosts
> will be dropped. Previously unknown hosts will get serviced but
> only after some razzing. Advertising different MX hosts to known
> and unknown query origins, and hosting these on different nets with
> different service levels is also likely (this is a modification of
> Brad Templeton's current "best plan" for spam. The net result is
> that good transmitting MTAs get priority access, bad MTAs don't
> steal resources, and are themselves forced to pay through time or
> other resource costs to send mail -- but all in a way that's
> compatible with current SMTP protocols.
> Karsten M. Self <email@example.com> http://kmself.home.netcom.com/
> What Part of "Gestalt" don't you understand?
> Bush/Cheney '04: Leave no billionaire behind
I like this suggestion. I know I don't know a lot about what spam really is.
I sense from reading this thread that others also don't know a lot. Some do,
but many don't. So research that results in firm numbers about the nature of
the problem is clearly a good thing.
One addition to Karsten's questions/issues:
It has been claimed that one person's spam is another person's ham. To
what extent is this actually true? Or is this just obfuscation by the
advocates of spam? If we had collections of ham and spam that have
been accumulated by different users with different filter set ups, we
could look for overlap and disjointness of sets. Or just run one
person's spam thru another person's filter. Lots of opportunities for
useful statistical studies.
But, a question: To what extent is it possible to trace a spam message back to
its human originator? Is the 'envelope from' really reliable? What sort of data
can/should be used to convict a 'perp'?
Paul E Condon