Re: Anti-Spam ideas for usenet/list harvested email addresses

To: debian-user@lists.debian.org
Subject: Re: Anti-Spam ideas for usenet/list harvested email addresses
From: "Jacob Anawalt" <jacob@cachevalley.com>
Date: Tue, 23 Sep 2003 16:32:12 -0600 (MDT)
Message-id: <[🔎] 3988.192.168.1.4.1064356332.squirrel@scsi-burn.office>
In-reply-to: <[🔎] 3F70AEE7.30306@etnsystems.com>
References: <[🔎] 1141.192.168.1.4.1064344598.squirrel@scsi-burn.office> <[🔎] 3F70AEE7.30306@etnsystems.com>
Rich Puhek said:
> (my reply is a bit disjointed, since I put things inline, and jumped
> around while crafting my response...sorry for the nonlinear thinking
> pattern)

'sOK. I thought you had some good points. Thanks for the input. Inline is
just right for me.

>
> Jacob Anawalt wrote:
>
>> To me the big question is how do I avoid the spam in the first place,
>> besides avoiding email all together? I want to participate on the web, I
>> just don't want so much junk email nor do I want to have my mailbox or
>> ISP
>> suffering from gigabytes of worm attachments or advertising data.
>>
>
> Your ISP should be filtering worms. It's fairly easy to do. If they
> don't want to bother with setting up a virus filter, hard drive space is
> fairly cheap. In addition, it would be nice if more ISPs filtered
> outgoing email as well. That's not always practical, and it won't stop
> the latest worms which sprechen SMTP, but it could help.

I don't want to spend CPU cycles, bandwidth or disk space scanning the
DATA section of an SMTP transfer or post-reciept scanning to determine if
it's mail I want in my inbox. (1)

How is the ISP filtering the mail if not by giving 250 OK to HELO, MAIL
FROM: and RCPT TO: and entering into the DATA section.

>
>> We've all done or seen people do this: jacob at cachevalley dot com,
>> jacob.nospam@cachevalley.com, jacob@cachevalley.nospam.com, etc.
>>
>> Are we kidding ourselves thinking that if we can write a filter rule
>> that
>> just catches SoBig.[A-Z], that someone else can't turn all of those
>> 'safe'
>> addresses back into the real  email address?
>>
> Spammers don't really care either way... look to the dictionary attack
> type of spammers for an example...("well, I've seen a
> jacob@some.company.com, so let's try "jacob@cachevalley.com" as well).
> The problem with turning a "safe" email address into a real one isn't a
> big deal, it just protects against the "dumb" harvesters. It's like
> using The Club on the steering wheel of your car... it won't defeat an
> experienced car thief, but it may convince him to skip your vehicle.
>
> In the case of a mailing list, I fail to see any advantage in the
> obfuscation of your email address, since it's present in the header. The
> exception would be private versus post-only addresses, as you mention
> below.

Yes, and jacob.lists@cachevalley.com would be as weak as
jacob@lists.cachevalley.com under your very valid point.
jlamaillists@cachevalley.com would be much better for my usenet/mailing
list address. Of course my real email will get spam because jacob is
common enough to try while running the gauntlet of admin, postmaster and
webmaster for viagra adds, so I need to stop accepting email on that
account and get a new alias for normal email, but my personal mail spam
isn't the issue I'm focusing on. I'm looking for solutions to spam to
email that went out to usenet or mailing lists.

[snip]
>
>> Another though I've had on the mailing list issues (besides wondering
>> why
>> I'm trying to make mail act like a news client with threads and looking
>> for a 'watch thread' capable client) is if I had an email address to use
>> on mailing lists that  only accepted email from the list servers I was
>> on
>> and reject all others I should only get the spam that relayed through
>> the
>> list.
>>
>> The mail server would need to have access to my personal list of
>> acceptable email addresses so it could give a 550 with the appropriate
>> extended SMTP code for unauthorized/security and an appropriate error
>> message after the HELO and MAIL FROM and RCPT TO: have been given. It
>> should only do this for mail accounts that have entries in the safe
>> list.
>> If your list is empty, all email is valid. If you have one or  more
>> entries, only those ones can send you email.
>>
>
> So in practice, the idea would work something like the following?
>
> 1) Create a "Debian-user only" address, which you'd use for posting to
> debian-user.
> 2) Email to the debian-user only address must come from the debian
> mailing list, or I'm going to SMTP-reject it, since it's probably from a
> spammer.

Exactly. Mostly. I'd like a "mailing list only" address that accepts mail
only from the lists I select.

>
>> Some ideas for rules to accept or reject the email may include:
>>
>> If HELO does not match a reverse DNS lookup and doesn't match the domain
>> of RCPT TO: or to a user specified value then the mail is rejected.
>>
> In general, this will reject legit mail. In particular, sites that host
> for more than one domain will not have a reverse DNS matching what you
> might expect.
>
> If only applied to a particular mailing-list, it might work, though.
> Perhaps even IP address would be fine (debian-user-jacob emails must
> come from a server with reverse DNS of murphy.debian.org). Note that you
> cannot trust reverse DNS, though, so a forward lookup would also have to
> be done.

Forward and reverse. OK.

Under my definition of valid email as "Valid email for this address is
_only_ email from the debian-users list" would this drop valid email?

>
>> A looser match would be just on the HELO <name>  where the name given is
>> some md5hash of the user's email address and some value noted on the
>> mailing list. People start getting spammed, the list admin changes the
>> key
>> used to generate the name value and people go to the web to see what it
>> has been changed to.
>>
>
> So the MTA on the Debian mail server, for instance, would have to be
> modified to generate a custom HELO for every message? This would really
> hurt for larger sites which have more than one recipient to a mailing
> list message...
>
>> A tighter setup might be to have the hash in the MAIL FROM: <value> and
>> have it be a hash of the subscriber's list password and their email
>> address. That way the subscriber can change their list password at any
>> time they see spam coming ?from? the list.
>>
> But for most mailing lists, MAIL FROM: is the sender's email address. To
> change that would require modifying the mailing list software to break
> the header, or modifying everyone's mail client. Again, this could get
> ugly for sites with multiple subscribers to popular mailing lists.
>

Debian-user is auto-generating the MAIL FROM: to create the per-user
bounce path.

Return-Path: <bounce-debian-user=jacob=cachevalley.com@lists.debian.org>

I didn't think if that was possible it would be a stretch to do the same
for HELO/EHLO or to use some md5 hash instead of my email address.

So people don't take this out of context, I am not saying the deiban mail
server has to change. DNS works for Debian stuff so I think I'm set there.
I'm just pointing out possible ideas for other mailing lists that don't
have reverse dns or for better security against dns or ip spoofing.

I do believe the Debian list server could use more help. The more spam it
has to fight the slower it will be without donations to upgrade bandwidth
and CPU, unless we can find a better solution - but that's a topic for a
different thread.

>> I'm sure there are other better ideas to be had along the lines of how
>> to
>> quickly identify that the sending server is who they say they are and
>> look
>> up a safe list to see if the user accepts email from that server.
>>
>
> For a dead simple solution, set up a subdomain like
> @lists.cachevalley.com, and run a MTA dedicated to list traffic. Using
> existing SMTP access control, deny all access except for the IP
> addresses of servers you communicate with, and internal servers.
>

Simple is good. That is where I may start. As mentioned above my email
won't be jacob@lists.cachevalley.com.

> You could even whitelist additional entries, perhaps by automatically
> scanning the mailing lists and (temporarily?) adding IP addresses of
> recent posters.

I like the safe listing of some posters idea and had it in mind, but then
the thought of knowing they weren't spoofed by someone else or their ip is
dynamic came to mind. By the comments I've seen from some posters about
how they feel towards people who email them off-list, they may not want
individual poster safe listing.

>
>> A side benefit of using an email address that only accepts list traffic
>> for some would be that it would reject the second email if someone
>> replies
>> to you and the list. People using this setup could have their .sig say
>> "This email address only accepts authorized list traffic, please reply
>> to
>> the list."
>>
>
> A simpler way is just make up something like
> "jacob-debian-list@cachevalley.com" as an email alias for yourself.
> Then, have procmail dump messages ^TO: that address into a folder,
> unless they do not come from murphy.debian.org, or something like that.
>
> You probably don't want to automatically delete them. You also probably
> don't want to tie it into the MTA, just in case something breaks down
> the line.
>

I'm happy with procmail and would use it to put X list into IMAP folder X.
It is a post-delivery mechanism though so not what I'm after for reducing
the spam in the first place.

I would be much happier with a SMTP message to the sender at the start of
the SMTP traffic. I'm not going to be deleting the email, I'm going to be
sending a reply that says "550 5.7.1 This email doesn't exist for anyone
but authorized mailing lists."

http://www.faqs.org/rfcs/rfc1893.html

>> Since we have seen that a greater volume of worm mail is possible with
>> email addresses usenet and mailing lists, it seems a setup based on this
>> system could help cut down the cost of fighting spam generated from
>> those
>> sources. The rules would be based on a simple lists, with each user
>> responsible for maintaining their list. Much less CPU power, bandwidth
>> and
>> storage space would be required to match those rules because the
>> matching
>> is done before delivery is accepted. Mailing lists could publish to
>> their
>> subscribe page the values they use for HELO and MAIL FROM when sending
>> the
>> messages to all subscribers.
>>
> I'd differentiate between worms and spam more clearly. Worms/viruses are
> fairly easy to keep up with, in that daily updates of your anti-virus
> program will result in capturing virtually all viruses/worms with
> virtually no false positives. Plus, you'll catch direct client to client
> mail, instead of just mail to addresses harvested from mailing lists.
>

Same deal, this is post-DATA or post delivery 250 OK solution. Not what
I'd like.

If my email address on this list didn't accept email from anyone but this
list then all the direct worm spam that happened the past few days would
have had much less of an impact in terms of bandwidth and disk space for
me.

Throw in some teergrubing and I can assist the community by slowing the
spammer down. That wasn't my goal though.

We (people with email available via usenet) were getting swamped with worm
generated messages. I was getting the same email from thousands of
individual computers. Maybe this is a bad knee-jerk solution to this. I'll
happily entertain other solutions. This worm will eventually be patched
and go away, but others will come.

>> Compare this to the "dog chasing cars" method of inventing a new filter
>> rule that looks through the MIME data to decide if this is the latest
>> worm
>> you don't want or the kissing picture that you do. Sure it's cool to be
>> a
>> geek and figure out the rules. If you like doing this, do it. Maybe spam
>> isn't a cost to you but a benifit if you consider your enjoyment at
>> solving each filter puzzle. I think that's why I like finding bugs, to
>> help find and solve puzzles. On the other hand this method of filtering
>> is
>> more expensive in every measure I can think of except the freedom of
>> allowing anyone to email you anytime. You spend time thinking up rules,
>> writing rules and testing rules. The rules are applied after you have
>> accepted the bandwidth of the transfer. Running the rules takes CPU time
>> and possibly more bandwidth as you do RBL DNS or Razor and storing the
>> email takes disk space.
>>
> Again, there's a big difference between catching worms and catching
> spam. clamav's auto update ensures that my Amavis will catch just about
> everything worm related.
>

Ditto again on post DATA.


[snip]
>> One major concern that I've lightly touched on and will bring up again
>> is
>> ?What if I want to have other people contact me off list?? You wouldn't
>> want to post your non-list-only email to the list, that would be
>> counter-productive. There's got to be a convenient way of providing a
>> source for people to look up your email address that is very resistant
>> to
>> scripting it's harvest for the UCE/worms/etc. One idea that comes to
>> mind
>> are images of pictures with your email address on your web site. I keep
>> thinking that PGP/GPG should be able to help in some way, either by
>> adding
>> to the EHLO command set or something on the users web site. There have
>> to
>> be better and still simple ways of doing this that make it cost much
>> more
>> to find our email addresses than it costs us to filter the junk.
>>
> True. But you still don't solve the problem of having someone easily
> contact you off list. In the case of this email, I've decided I have
> something worthwhile to say on the topic at hand (or I'm bored, and want
> to babble about email filters...) so I hit "reply to all". If I had to
> break my train of thought to sift through your website to find your
> email address, I'm probably not going to bother. Also consider the fact
> that some people do have to read email offline, and rely on the
> assumption that all necessary contact info is contained in the email
> itself.
>
> Enhancing EHLO would probably not be realistic, given that virtually all
> email clients would have to implement it. It's like saying "oh, just
> turn on SMTP authentication, and we can be sure that the sender isn't a
> spammer, or at least can track them down".
>
> Images with pictures of your email address is fine, but again, it's just
> a slightly more difficult form of "jacob at cachevalley dot com"...
> eventually wouldn't the spammers just create OCR software that looks for
> email addresses in images on websites linked from your website?
>

You're right. You'll reply to all, and the message to me will bounce. I
miss out on the conversation. That's my loss, or your frustration because
things weren't easy for you if you really needed to talk to me off list.

Same argument goes for me not answering the phone when someone who hates
answering machines calls me. We're two Zaks waiting for the other to
yield. I say my phone is for my conveniance and you say it is for yours. I
don't want to answer my phone because 90% of the time it's a solicitor and
you want me to answer because you know you're not a solicitor.

http://www.eg.bucknell.edu/~cs315/subpages/inline/Zax.html

I agree that EHLO for keys may not be realistic. Maybe as IPsec grows in
usership oppertunistic ipsec may help out here.

Having spam harvesters doing OCR on a miriad of pictures (maybe mine is of
my car with my email at the top and yours is of your bike with the email
at the bottom) will cost them a lot more to process than it is now, and
may cost more than it costs you to filter spam.

Still you have a good point that it is still possible to get and that
people will probably hate me for requiring them to look my email up on the
web. I'm not saying images are the solution. I'm looking for one. One that
can be transfered via text in the email but is difficult (on the order of
doing OCR) to determine the address would be nice. As soon as everyone
starts doing it, people will find a way to crack it.

>> The sad part is that I've already squandered my username at this email
>> address by putting it where it can be harvested in mass by worm/virus
>> and
>> UCE/UBE collection scripts, and I had already read an article cautioning
>> me against this. Oh well live and learn (someday I'll learn anyway.)
>>
>> I'm going to look into setting up a new email address with mail server
>> rules for delivery driven by a user supplied whitelist after waiting a
>> few
>> days for comments and flames on this idea. If you know of links to pages
>> already discussing how to do this with postfix, please share them.
>>
>>
>
> Look to SpamAssassin. That will make a huge dent in your spam problem.
> Tack on Amavis for the latest in MS malware, and you're in business. I
> believe both integrate fairly well with Postfix.
>
> Amavis is also able to reject viruses during the SMTP transaction. This
> I would agree with, if your configuration allows it.

I currently reject dos/win executables during the SMTP DATA transaction
via postfix body_checks. For all valid email I'm scanning the whole
message every time so that I might catch the ones that are invalid
(postfix 1.0).

>
> Some good thoughts there... but I wonder just how many mailing lists
> would need to apply such a solution to make an impact, and how difficult
> it would be to apply. OTOH, you might find better results with simpler
> methods...
>

Well for the Debian list, they don't have to do a thing for me to
implement my idea. It's all on my end since their DNS is right. For other
lists I see sending the hash of "youremail+listpasswd" @somelistserver.net
where they use a mailing list that has a password already as not that big
a deal, as long as the list process does outgoing smtp itself.

If I change my email, only subscribe to debian-user and use this system,
then wouldn't there be an immediate impact on my experiance? It's all
about me afterall ;). If others wanted to do the same thing to avoid
usenet/list collected email attacks then they can benefit (and miss out)
as well.

I think that an important problem to solve is comming up with a good way
of getting people your email if they want it (and if you want them to have
it) without making it easily accessable by a simple text parsing script. I
want them to have to have GnuPG, guile, perl, and OCR in their harvesting
program. ;)

-- 
Jacob
Trying out SquirrelMail
Reply to:
References:
- Anti-Spam ideas for usenet/list harvested email addresses
  - From: "Jacob Anawalt" <jacob@cachevalley.com>
- Re: Anti-Spam ideas for usenet/list harvested email addresses
  - From: Rich Puhek <rpuhek@etnsystems.com>
Prev by Date: Re: OT: RH and Debian brothers now?
Next by Date: Re: Is "Motion" being actively maintained?
Previous by thread: Re: Anti-Spam ideas for usenet/list harvested email addresses
Next by thread: Re: Anti-Spam ideas for usenet/list harvested email addresses
Index(es):
- Date
- Thread