[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Spam filtering



Hi,

	The script attached below will be included in the next version
 of mailagent. It grabs the latest list of spammers and can create one
 or more files in formats amenable to either mailagent or sendmail. I
 can easily massage it to produce files to be included in hosts.deny
 (I think).

	In view of that, I'd prefer seeing the functionality of of
 tcpd extended to allow file inclusion in hosts.{allow,deny}.

	manoj

-- 
 Think it's time I'm leavin' / Nothin' here to make me stay. Led
 Zeppelin
Manoj Srivastava               <url:mailto:srivasta@acm.org>
Mobile, Alabama USA            <url:http://www.datasync.com/%7Esrivasta/>

======================================================================
#! /usr/bin/perl
require "getopts.pl";

#  Scott Blachowicz <scott@statsci.com>
# -l LISTNAME (colon sep list of indices into %urls hash)
# -o OUTPUT  (base prefix for split lists, filename for merged list)
# -s (split lists into individual files - default is on for 'mailagent' &
#     off for others)
# -S (turn off splitting)
# -t TYPE_OF_OUTPUT ("sendmail" or "mailagent" - default "mailagent")
# -v (verbose)
#
# This script is distributed under the same conditions that perl is.
#
&Getopts ('l:o:sSt:v');

my $lists = defined $opt_l ? $opt_l : 'ALL';
my $spam_base = defined $opt_o ? $opt_o : "$ENV{'HOME'}/etc/spamlist";
my $output_type = defined $opt_t ? $opt_t : "mailagent";
my $split_lists = defined $opt_s || ($output_type eq "mailagent");
$split_lists = 0 if defined $opt_S;
my $verbose = defined $opt_v;

use strict;

use Sys::Hostname;
my $host = hostname();
if ($host !~ /\./) {
  # Try to add a domain name?
  my ($name, $aliases, $addrtype, $length, @addrs) = gethostbyname($host);
  my @aliases = grep(/\./,split(/\s+/,$aliases));
  $host = $aliases[0] if @aliases;
}

use URI::Escape;
my $ftpuser = "ftp:despammer%40" . uri_escape ("$host");

use LWP::Simple;

my %urls = 
  ('aol', "http://www.idot.aol.com/preferredmail/";,
   'mindspring', "http://www.atl.mindspring.com/cgi-bin/spamlist.pl";,
   'znet', "http://www.znet.com/spammers.txt";,
   ## too many bad matches: 'wsrcc', "http://www.wsrcc.com/spam/spamlist.txt";,
   'iocom', "http://www.io.com/help/killspam.php";,
   'nancynet', "ftp://ftp.cybernothing.org/pub/abuse/nancynet.domains";,
   'cyberpromo', "ftp://ftp.cybernothing.org/pub/abuse/cyberpromo.domains";,
   'llv', "ftp://ftp.cybernothing.org/pub/abuse/llv.domains";,
  );

my %parsers = 
  ('aol', '&parse_aol($_)',
   'mindspring', '&parse_mindspring($_)',
   'iocom', '&parse_iocom($_)',
  );
my %unspam = 
  (
   'concentric.net', 'non-spam emails', #wsrcc
   'demon.net', 'non-spam emails',    #wsrcc
   'hotmail.com', 'free email used by non-spammers as well',
   'interactive.net', 'non-spam emails',    #znet
   'mindspring.com', 'non-spam emails',    #wsrcc
   'psi.net', 'non-spam emails',    #wsrcc
   'shoppingplanet.com', 'non-spam emails',
   'vnet.net', 'non-spam emails',   #wsrcc
   'yoyo.com', 'non-spam emails',
  );

if (! $split_lists) {
  open OUT, ">${spam_base}" or die  "create ${spam_base}: $!";
}

my $site;
foreach $site (keys %urls) {
  print "# Processing '$site' at URL $urls{$site}\n" if $verbose;
  
  if ($_ = get $urls{$site}) {
    if ($split_lists) {
      open OUT, ">${spam_base}-$site" or
	die  "create ${spam_base}-$site: $!";
    }
    
    ## 1) Filter out duplicate sites if going to one spamlist file.
    ## 2) Filter out '#'-started comments.
    ## 3) Filter out blank lines.
    ## 4) be sure $1 is what you want in the annotation
    ## 5) if no @ char, stick a "any user/any subdomain/host" regexp in.
    print OUT map {s/\#.*$//;
		   /\S/ && eval "\&filter_${output_type}(\$_)";
		 } grep((!$unspam{$_} &&
			 ($split_lists || !$unspam{$_}++)),
			($parsers{$site} ?
			 eval "$parsers{$site}" : split /\n/));
    close OUT if $split_lists;
  }
  else {
    warn "Cannot get $urls{$site}\n";
  }
}
close OUT if !$split_lists;

sub filter_mailagent {
    local($_) = @_;
    "/^(" . (/\@/ ? "" : "(.*[\@.])?") . "\Q$_\E)\$/i\n";
}

sub filter_sendmail {
    local($_) = @_;
    "$_ " . (/\@/ ? "SPAMMER" : "JUNK") . "\n";
}

sub parse_aol {
    local($_) = @_;
    if (! s/^[\s\S]*<MULTICOL.*\n//) {
        warn "parse_aol: missing MULTICOL in $_ ";
        return ();
    }
    if (! s/<\/PRE[\s\S]*//) {
        warn "parse_aol: missing /PRE in $_ ";
        return ();
    }
    split /\n/;
}

sub parse_mindspring {
    local($_) = @_;
    if (! s,^[\s\S]*?<pre>[^\n]*\n,,) {
        warn "parse_mindspring: can't find block of hostnames";
        return ();
    }
    if (! s,</pre>[\s\S]*?<pre>[^\n]*\n,,) {
        warn "parse_mindspring: can't find block of email addresses";
        return ();
    }
    s,</pre>[\s\S]*$,,;
    split /\n/;
}

sub parse_iocom {
    local($_) = @_;
    if (! s,^[\s\S]*?<H(\d)>Blocked\s*Domains</H\1>[\s\S]*?<TABLE[^\n]*\n,,) {
        warn "parse_iocom: can't find 'Blocked Domains' table";
        return ();
    }
    if (! s,</TABLE.*,,) {
        warn "parse_iocom: can't find end of 'Blocked Domains' table";
        return ();
    }
    s,<[^>]+>,,g;
    split /[\s\n]+/;
}

## This version written by:
##  Scott Blachowicz <scott@statsci.com>
### Originally from...
### 
### which creates a series of lines in "~/.spamlist" that look like:
### 
###     /^((.*[@.])?1floodgate\.com)$/i
###     /^((.*[@.])?205\.254\.167\.57)$/i
### 
### which happen to be directly useful in .rules lines that look like this:
### 
###     ## flag spam (thank you, AOL!)
###     <TO_MERLYN> Envelope From Sender Relayed Reply-To: "~/.spamlist" {
### 	    ANNOTATE -d X-merlyn-spam Smells like spam from %1;
### 	    ## eventually, file in list.spam or delete,
### 	    ## but for now, just testing...
### 	    REJECT;
###     };
### 
### This is still a work in progress, but I thought I'd publish this alpha
### release in case anyone else wanted to hack along with me.
### 
### -- 
### Name: Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
### Keywords: Perl training, UNIX[tm] consulting, video production, skiing, flying
### Email: <merlyn@stonehenge.com> Snail: (Call) PGP-Key: (finger merlyn@ora.com)
### Web: <A HREF="http://www.stonehenge.com/merlyn/";>My Home Page!</A>
### Quote: "I'm telling you, if I could have five lines in my .sig, I would!" -- me


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-devel-request@lists.debian.org . 
Trouble?  e-mail to templin@bucknell.edu .


Reply to: