on Sun, Aug 03, 2003 at 08:37:49PM +0200, David Fokkema (dfokkema@ileos.nl) wrote: > On Sun, Aug 03, 2003 at 05:13:26AM +0100, Karsten M. Self wrote: > > As some here are aware, I maintain a rant-o-matic with some standard > > screeds on frequently iterated issues. The C-R issue is one that's been > > nagging at me for a while, here's the draft of why C-R is considered > > harmful. Critique/comment welcomed. > > > > You're problably receiving this because I've received a C-R from > > your mail system. If you've received this, that is.... > > > > Spam is a growing, heck, exploding problem. No doubt. > > Challenge-response (C-R) is a flawed tactic, for the following > > reasons. > > > > 0. Weak, and trivially abused, verification basis. > > > > The 'FROM:' header of email can be, and routinely is, spoofed. > > It offers no degree of authentication or evidence of identity. > > As filtering is a spam-reduction system, so is C-R. The chance of > receiving spam from a whitelisted address is, in the experience of tmda > users, very rare. And even if it happens, it only adds up to low > statistics. C-R uses the "From:" header (with implementation-specific variations) as a key. While a given key is going to have a relatively low likelihood of being cleared by a given user, there are keys which will have a high likelihood of being cleared. Off the top of my head, @microsoft.com, @aol.com, @ebay.com, @*.gov, and other major commercial, financial, and governmental institutions, would be likely to be cleared by a large number of users. C-R moves you back to square one of the fact that SMTP can't provide authentication of email headers. At the very least, contextual analysis of headers (as Alan admits) is necessary. If you're already taking this step, heuristic and Bayesian methods are a low-overhead next step, which have proven to be highly effective and accurate. > > 1. Misplacement of burden. > > > > Effective spam management tools should place the burden either > > on the spammer, or at the very least, on the person receiving > > the benefits of the filtering (the mail recipient). Instead, > > challenge-response puts the burden on, at best, a person not > > directly benefitting, and quite likely (read on). The one party > > who should be inconvenienced by spam consequences -- the spammer > > -- isn't affected at all. > > The spammer is affected in the same way as it is affected by filtering: > it sees revenues going down, fast. No. The spammer sees delivery effectiveness diminished at the margin -- that is, on a message-by-message basis. Which is precisely the same effect as _any_ content or context-based filtering achieves. C-R doesn't lose in this regard, it just doesn't win. Tools such as Teergrube and RBL _do_ fundamentally shift the balance against the spammer. > Additionally, if a spammer wants to deal with challenges, it has to > not only use valid addresses, it has to use addresses which he can > read replies from. That way, it is usually easy to report to his ISP. Wrong. See response to 0. above. > > 2. Privacy violation. > > > > A record of our correspondence is being maintained by a third > > party who has no business knowing of the transaction. Many > > people will refuse to respond to C-R requests for this reason. > > I don't see this point. Several of the C-R proponents are making a number of assumptions. Most of the _general_ discussion (that is, outside this mailing list) has concerned service-model enterprise models in which C-R is provided and hosted by a third-party, which is then aquiring a rather interesting database of communications patterns, which _must_ be maintained on a persistent basis. Not the sort of thing I'd like to have available to an arbitrary subpeona request. Even TMDA is server-side based. This places it beyond the immediate control of the typical home email user. > > 3. Less effective at greater burden than reciever-side > > whitelisting. Note that what I'm discussing here is a system where the reciever determines, and accumulates, his or her own whitelist. This is the system I use, it is completely transparent to the sender. Essentially: I'm doing my own dirty work rather than outsourcing. The advantage is that this gives me a greater degree of control over the outcome. The incremental cost is very low, and the potential offset-loss (missed message) is very high. > > A C-R system is essentially an outsourced whitelist system. The > > difference between a C-R system and a self-maintained whitelist > > is that the latter is: > > > > - Maintained by the mail recipient, rather than a third party. > > - Is the responsibility of the mail recipient, rather than the > > sender. > > - Places the burden on the recipient to add new addresses to > > allow/deny lists. > > > > I might add that I myself use a mix of whitelisting and spam > > filtering (via SpamAssassin) to filter my own mail with a very > > high level of accuracy, in terms of true positives, true > > negatives, false positives, and false negatives. > > I don't see the third party. The third party is the C-R service provider, see response to 2. above. > > 4. High type II error (beta). > > > > Because of numerous issues in sender-compliance with C-R > > systems, C-R tends to a high false postive rate. This is known > > as type II error, in statistical tests, and is denoted by beta. > > > > The mechanics of C-R systems lead to a fairly high probability > > that users of such > > I don't know anything about this. You or Alan. This is among the aspects of C-R systems which makes it completely unsuitable for any practical use, without safeguards such as periodic review of queued messages (a practice Alan specifically rejects) or secondary pass mechanisms. A false positive (type II error) is ham mail that is incorrectly tagged or otherwise treated as spam. C-R as defined assumes that mail is spam until otherwise determined. By contrast, my own filtering system assumes a mail is of unknown nature until otherwise determined. As various filters (whitelists, blacklists, spamlists, Spamassassin, mailing list filing, etc.) are applied, the incoming message stream is categorized and filed. At the end of the stream, a message isn't either spam or ham, it's grey (unknown). These messages are then manually evaluated. I receive a handful of such messages per day, recruiters, some spam, and contacts from previously unknown correspondents or new addresses of known correspondents. It takes only a few seconds to deal with these. The bulk (40-80+ messages/daily) of my spam is automatically transferred to a spam folder where a quick scan (10 seconds) is all that's necessary to ensure that something hasn't been misfiled. Alan apparently defines as an identity spam / unwanted mail anything that arrives from someone who doesn't comply with a C-R procedure. This is not a pragmatic definition. For standard definitions and descriptions, see: http://mathworld.wolfram.com/TypeIIError.html http://www.acponline.org/journals/ecp/novdec01/primer_errors.htm > > 5. Potential denial of service. > > > > C-R systems can be used in a denial-of-service or "Joe" attack > > on an innocent third party. In fact, this is likely to start > > happening shortly as C-R becomes more widespread. > > > > How? Simply: Spammer spoofs a legitimate sending address (this > > is already commonplace). C-R systems then send out a challenge > > to this address. With only 1% penetration of C-R, the victim of > > the C-R/Spam attack is deluged with 100,000 challenge emails. > > This could likely lead to lawsuits or other legal challenges. > > This can also be done right now. I guess that people who've had their > mail addresses spoofed know this because they receive a lot of bounces, > angry mails, replies from stupid people, etc. And if a spammer really > wants to start a DoS attack, why using such an elaborate way if he can > spoof whatever address he'd like? It's not the intentional use of C-R by the spammer, it's the unintended consequence. Though spamming a user population known to have a high utilization of C-R (say, subscribers of ISPs or organization known to use C-R methods) could be intentional. > > 6. C-R - C-R deadlock > > > > This is almost funny. > > > > How do two C-R system users ever start talking to each other? > > > > - User A sends mail to user B. While user B's address is then > > known to A, user B's C-R server's mail is not. > > > > - User B's C-R system sends a challenge to A... > > > > - ...who intercepts the challenge with A's C-R system, which > > sends a challenge to user B's C-R system... > > > > No, I didn't think this one up myself, see Ed Felton's "A > > Challenging Response to Challenge-Response": > > > > http://www.freedom-to-tinker.com/archives/000389.html > > > > Bypassing this deadlock then opens an obvious loophole for > > spammers to exploit. > > This must be very, very old. Uh, no... it's not. This guy obviously > hasn't read through tmda.sourceforge.net. I'm sorry, tmda doesn't have > a loophole. Only automatic whitelisting and dated addresses which, in > the experience of tmda users, are not a risk. And even if they were a > slight risk, C-R is a spam_reduction_ system. This and other responses assume "well-designed" C-R systems. Current experience with vacation responders and spam-notification filters provide strong empirical evidence that a significant number of C-R systems will in fact _not_ get this right. The TMDA FAQ item 1.9 lists fifteen similar systems. I'd appreciate your comments on the specifics of design, implementation, design integrity, and handled mail volume, of these systems. As well as the other C-R systems not listed in the TMDA FAQ. > > 7. Potential integration into spam email harvest systems. > > > > One commonplace piece of advice for avoiding spam is to not > > respond to opt-out, aka email validation testing, requests. > > > > C-R spoofing on the part of spammers would simply hijack a > > presumption that C-R requests were valid to provide spammers > > with higher-quality mailing lists. > > TMDA _always_ includes the original e-mail, so the recipient of the > challenge can check if it really was him sending a mail. See above response. This assumes a well-designed system. My experience with C-R demonstrates the contrary. In any event, habituation and uncertainty on the part of the typical recipient of a challenge moot this point. See the current rash of identity theft / CC theft scams based on "updating your account information". C-R at best promotes bad personal identity protection practices. > > 8. Likly consequences. > > > > The C-R user is likely to find their own address added to > > blocklists from many users and/or mailing list adminstrators > > burned by malformed, or simply unwanted, C-R requests. > > If not set up correctly, this might happen. First, see response to 6., regarding "well-designed" systems. Second, this factor is entirely outside the bounds of the C-R system, it is a reflection of the independent response of individuals and organizations to receiving C-R challenges. C-R definitionally cannot accomodate this. Third, item 1.10 in the TMDA FAQ directly contradicts you: Spamassassin tags TMDA challenges as spam. Spamassassin filters are a concensus hueristic based on the aggregate spambase contributed by Spamassassin developers and users, modulo local filterns and Bayesian modifications. The concensus reality, then expressed by Spamassassin users is that C-R challenges are spam. Beyond any semiotic arguments of what spam is or isn't, if the operational reality is that Spamassassin reflects the opinion of SA users and developers and treats C-R transactions as spam, then my statement 8. is validated. > > 9. Mailing list burden. > > > > C-R systems typically misfunction on mailing lists in one of > > two ways, neither of which is acceptable: > > > > 1. The C-R sends a challenge to the list for messages received. > > > > 2. The C-R sends a challenge to each individual listmember for > > the first post received. > > > > In both cases, the burden is placed on a party who could care > > less about the benefits of the C-R system. Several lists of my > > aquaintance have taken to permanently banning any users who > > exhibit use of misconfigured C-R systems. > > Those C-R systems are set up _very_ incorrectly. See response to 6. regarding "well-designed" systems. <...> > > > > > > 10. Fails to address techno-economic underpinnings of spam. > > > > Spam exists for one reason: it's profitable. > > > > It's profitable because technology allows the costs of sending > > a large number of mail messages to be lower than the revenues > > available for doing so. > > > > Any effective spam remedy must attack one or the other side (or > > both) of this equation: raise the costs or reduce the > > technological effectiveness, on the one side, or reduce > > revenues on the other. > > > > C-R, as with most recipient-side filtering systems, imposes > > negligible incremental overhead on the spammer. A delivery is > > made, the spam server moves on, the cost is a single SMTP > > connection for a fractional second. Collateral costs are high: > > for legitimate senders, spoofed reply addresses, mailing lists, > > and retaliatory actions on the C-R user. > > > > A truly effective spam defense must attack the techical and > > economic aspects, in as unobtrusive a manner as possible. > > > > The one system which seems to best fit this requirement is the > > Teergrub -- the spam tar-baby, FAQ at: > > > > http://www.iks-jena.de/mitarb/lutz/usenet/teergrube.en.html > > > > A teergrubing mailserver costs a spammer multiple SMTP > > connections, an inherently finite resource, for possibly hours. > > Workarounds on the part of the spammer are possible, but all > > result in higher costs, reduced delivery, or both. The net > > effect is essentially a delivery payment requirement, though > > the payment is in the form of time and configuration on the > > part of the spammer. Collateral damage is low -- if a > > teergrube _does_ unintentionally filter a legitimate sender, > > the only cost is a single (or very small number of) delayed > > delivery. This and other issues are covered at the FAQ above, > > read it before posing hypothetical problems. > > You mention that workarounds are possible, but result in higher costs. > The same is true for TMDA: by requiring spammers to set up a C-R > auto-reply system, they make their mails traceable to their origin. I > don't think spammers can spoof a valid mail address _and_ capture the > replies. Incorrect. See response to 0. > To read a nice pro C-R page, read tmda.sourceforge.net. Check their FAQ, > for example, or read the introduction. In particular, the part about > 'Won't senders just refuse to confirm their messages?' is nice, if you > read through to the part about using 'dated' addresses. First, the apparent outcome here is distinctly different from the experience reported in the FAQ. Second, given that both you and Alan here fail to even understand what Type II errors are, let alone contenance their existence in the face of C-R, I find the evidence provided by you to be less than convincing. Third, in edge cases (mass distribution, mailing lists, malformed messages, poorly designed systems, spammer abuse of C-R concepts), this behavior is very likely to increase, rather than decrease. FAQ item 1.5 specifically assumes direct personal communications, which is in many cases invalid. Fourth, the issues addressed in the TMDA FAQ are specific to TMDA and cannot be generalized to other systems. Specifically, several comments of Alan regarding his system design directly contradict statements of the TMDA FAQ, particularly regarding message loss. Peace. -- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? At the sound of the toner, boycott Lexmark: trade restraint via DMCA. http://news.com.com/2100-1023-979791.html
Attachment:
pgpHebAYWPfyc.pgp
Description: PGP signature