on Wed, Sep 15, 2004 at 07:45:07AM +0200, martin f krafft (madduck@debian.org) wrote: > Dear developers, > > Over the past 2 months, about 80 or 90% of the spam I received > through @d.o came from the networks of Chinanet. I have reported > every issue, but they never responded, nor are they taking counter > measures. Some of the spammer's IPs have remained constant. This > suggests to me that they are spammer-cooperative (or generally > incompetent). > > May I suggest that we block Chinanet? Their subnets are > > 222.64.0.0/13 > 222.72.0.0/15 > 202.108.181.0/24 > 221.224.0.0/13 > 218.78.0.0/15 > 218.80.0.0/14 See "Incidentally" below for more on these specific netblocks. > Or you can use rbl.madduck.net, which filters them. I think we could > potentially cut a lot of spam by blocking these IPs. In my typical late-to-the party fashion, a few additional comments on this topic. Some of which I've previously discussed with Martin off-list. I've been working with ASN and CIDR data associated with spam received via my ISP account. While the specific findings I've got may be interesting, the methods are of more general use. Short answer: you can classify incoming mail using its IP into its network of origin, with a DNS query. Background: ASN identifies the Autonomous System. Effectively, these are the networks the Internet is networking between. Each is defined by a single span of routing authorities, peers, etc., and largely, organizational authority. In other words: you've got an identifiable, accountable entity with a definable network space. More to the point: they're _accountable_ for that space, and had damned well better be keeping it clean. By getting the ASN associated with an IP and tracking same for spam received, it's pretty easy to find out where the bulk of spam is coming from. My stats _don't_ reflect where ham is originating from, so I'm getting raw volume, but not ratio, data here. This could be tacked on to a better-developed system. There's a very strong power relationship in what ASNs contribute what proportion of spam. Over the past nine months: - A single ASN has contributed ~15% (12-17%) of all spam I receive. - 4-5 ASNs account for a quarter of all spam. - 20-30 ASNs account for half of all spam. I track results at: http://linuxmafia.com/~karsten/monthly-asn-report-current.txt ...as well as history. See my homepage for details. Working with ~20 days' spam, I get the following breakout for the top 20 ASNs (the report linked above provides additional details such as name of the network). This is based on a total of 7093 spams, and includes 817 ASNs. There are ~24k assigned ASNs total. 1 1099 ASN-4766 2 347 ASN-4134 3 263 ASN-9105 4 256 ASN-9277 5 134 ASN-4814 6 122 ASN-4837 7 114 ASN-3352 8 111 ASN-12076 9 97 ASN-18747 10 93 ASN-11908 11 82 ASN-7132 12 81 ASN-9924 13 80 ASN-7418 14 78 ASN-6939 15 78 ASN-3269 16 77 ASN-3786 17 75 ASN-8346 18 69 ASN-;; 19 68 ASN-4713 20 54 ASN-3462 These include: KORNET, China Telecom, tiscali-uk, thrunet, chna169, CNCGROUP (China), China Network Communications, TDE (Spain), MSN, IFX, Verestar, SBC, Taiwan Fixed Network, Hurricane Electric, Telecom Italia, DACOM (Korea), Sonatel (Senegal), NTT-OCNET, Chunghwa Telecom (China). For CIDR my data show the top 20 being the following. 1 388 222.96.0.0/12 2 259 212.74.96.0/19 3 256 221.144.0.0/12 4 200 61.72.0.0/13 5 93 64.4.0.0/18 6 93 220.120.0.0/13 7 90 61.254.0.0/15 8 90 195.166.237.0/24 9 81 200.73.64.0/19 10 70 213.154.64.0/19 11 67 connection/timed 12 63 61.31.128.0/19 13 61 64.71.128.0/18 14 51 165.165.0.0/16 15 44 213.215.128.0/18 16 43 192.118.68.0/22 17 42 211.36.160.0/19 18 41 211.110.0.0/16 19 37 212.216.128.0/17 20 35 80.88.128.0/20 These include: KORnet, Tiscali, KORnet again several times, Hotmail, etc. ('whois' on the IP will give you this): All well and good. How's it work? Simple: host -t txt <reversed ip>.asn.routeviews.org ...returns the ASN and CIDR for a given IP in parseable format as a DNS query. E.g.: host murphy.debian.org murphy.debian.org has address 146.82.138.6 $ host -t txt 6.138.82.146.asn.routeviews.org 6.138.82.146.asn.routeviews.org text "27354" "146.82.136.0" "21" So, that's AS27354, with CIDR 146.82.136.0/21. A subsequent 'whois AS27354' will tell you that this is LayerOne Holdings, Inc. For more general information: http://www.routeviews.org/ The data are compiled directly from BGP router maps. My understanding is that the zonefiles are downloadable (I'm checking on this now). They're certainly cacheable. More to the point: the data are available at SMTP time. The one bit of data you've got is your SMTP peer's IP. It really doesn't matter if this is the point of origin of the spam or just an upstream relay. If you know you're getting bad traffic from this network (ASN or CIDR), you can take appropriate action[1]. It's also possible, as I've done, to look at volumes of spam by ASN or CIDR. Better, as I indicated, would be *ratios*. A peer with a very high ham (non-spam) ratio, which has a spam volume that on an absolute scale is high, but proportionate to total traffic is middlin', might be allowed through. Incidentally, the ratio data should fall out of your Bayes classifier token database if you know how to parse it. Because the data can be encoded into firewall rules, it's possible to reduce mail filtering load by offloading this to your iptables rules. Any mail (or optionally: all) packets from highly hostile networks can be blocked. Or rate limiting can be applied. I'm particularly fond of the idea of rejecting packets from a network in proportion to its spam:ham ratio.... If you're not comfortable blocking by ASN, CIDR data give a slightly finer level of control. Even for a particularly standout bad net such as KORNET, there are CIDRs which are markedly worse than others, from a total volume perspective. The other nice thing about this is you can base filtering on your _own_, _current_ experience, and that relatively small sampling systmes generate useful statistics. Say, based on spam volumes for the current and prior fortnight or month. Tracking historical data too far back will result in previously clean nets being able to slide for a while. Keeping only relatively current data avoids this problem (and will probably be the subject of tuning arguments for years to come). My experience is that the inhabitants of the top five or so spots tend to remain in place for at least a few months at a time, particularly the leader (KORnet in my experience), though over the course of nine months or so I've seen considerable shifts in and out of the 2-5 positions. Another point is that for many stable email communities, the set of ASNs and/or CIDRs which correspond frequently is relatively small. For a set of 850 recent emails to this list, there are 234 distinct IPs, and 148 ASNs. Half of the volume was accounted for by 18 ASNs. Of the top-20 spamming ASNs the following appear in the d-u posts analyzed. "Freq" is frequency of occurence in the d-u sample. "Spam %" and "Spam Rank" are the percent contribution of these ASNs to my total spam load, and the ranking in total spam received, of these networks. Freq ASN Spam % Spam Rank Name ---- --- ------ --------- ------------------------------- 10 3352 1.4% 9 Internet Access Network of TDE 2 8220 1.0% 16 COLT Telecommunications - www.colt.net 1 3269 2.5% 5 TELECOM ITALIA It would be helpful to run an analysis over a larger corpus of list posts, but from the look of it, in the neighborhood of a quarter of spam could be eliminated from d-u with a 0.12% false positive rate. More selective filtering (say, CIDR rather than ASN) of less aggregiously spammy networks, and rate throttling rather than outright rejection, might balance mail filtering with allowing legitimate mail through. Incidentally, of the IP ranges Martin proposes blocking, my own experience shows: Rank Cum % Pct Spams ASN Description ---- ------ ---- ----- ----- ------------- > 222.64.0.0/13 181 80.0% 0.1% 15 4812 China Telecom (Group) > 222.72.0.0/15 Not assigned (possible bogon?) > 202.108.181.0/24 492 92.8% 0.0% 3 4808 Chinanet Beijing Site AS > 221.224.0.0/13 2 17.5% 4.1% 689 4134 China Telecom > 218.78.0.0/15 181 80.0% 0.1% 15 4812 China Telecom (Group) > 218.80.0.0/14 181 80.0% 0.1% 15 4812 China Telecom (Group) ...so at least in my experience, only 221.224.0.0/13 is a high contributor, which might reduce the false positive rate significantly. Of course, YMMV. Peace. -------------------- Notes: 1. If you want to use ASN in your procmail scripts, or to create a token which SpamAssassin and other Bayesian classfiers will automatically use, you can refer to my ASN procmail header creation rule here: http://linuxmafia.com/~karsten/Download/procmail-asn-header -- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? Kerry / Edwards '04 http://www.johnkerry.com/
Attachment:
signature.asc
Description: Digital signature