Re: demographics of debian users (was: ratio of male vs. female debian users)
On Thu, Jul 18, 2002 at 10:17:53AM +0100, Colin Watson wrote:
> On Wed, Jul 17, 2002 at 11:25:53PM -0700, Osamu Aoki wrote:
> > On Thu, Jul 18, 2002 at 12:58:09AM +0100, Colin Watson wrote:
> > > I prefer filtering mail based on the character set, which I think is
> > > much more reliable. After all, even if somebody did send me an e-mail in
> > > Korean, I'm not going to be able to read it.
> > >
> > > SpamAssassin takes care of enough of the rest that whatever does slip
> > > through doesn't bother me too much, and I filter rather than bounce so
> > > that I can deal with the very occasional false positive. That said,
> > > these two rules haven't had any false positives for me yet:
> > >
> > > # TODO: If this works, /dev/null?
> > > :0:
> > > * ^Content-Type: .*charset="?ks_c_5601-1987
> > > spam
> > I do not think this is right thing to do. This is as bad as filtering
> > by domain name etc. (I understand that this prevents many spams with
> > minimum CPU cycle.)
> So far I merely filter into a separate mailbox based on this, not delete
> it by default, so if a false positive does turn up then I'll see it in a
> day or two anyway. So far, in the month or two since I've been using
> this, I have not had a single false positive: nobody has ever sent me a
> mail in ASCII labelled as ks_c_5601_1987. (I'm very paranoid about
> losing valid mail, which is why I'm very conservative about
> /dev/null-ing things.)
I hear you and believe you. (I tend to do /dev/nul for some bad ones)
> Naturally somebody who corresponds with Korean Windows users in the
> habit of sending ASCII mail labelled as what I understand is an obsolete
> and deprecated Korean character set will not be interested in this rule.
> However, for me this is much less of an elephant-gun approach than
> filtering by domain name - it won't filter e.g. Korean developers - and,
That is a good point.
> as you say, is efficient in terms of CPU cycles. (Before I moved to
> spamc, SpamAssassin once sent the load on my box to 30 or above when I
> uploaded a package that closed 40 bugs, and made the machine unusable
> for interactive use for about half an hour!)
Yeah, I know it takes much CPU time if I scan mail contents. That is
the reason I stopped much of content filtering.
> > > # Apparently this is a legally-required Korean tag meaning "hello, I'm
> > > # spam." For once, I'm going to believe the spammers.
> > > :0:
> > > * ^Subject: (.?????|.*????.?$)
> > > spam
> > I use similar high bit filter mechanism and it captures all Korean Spams.
> > http://www3.sympatico.ca/walter.dnes/email/chinese/
> That link returns a 404.
Oops. URL is unreachable now.
I have its modified version here (this is procmailrc):
"SPAM ASIAN-8bit" is the filter you wants. Concept is very simple and
versatile. If you happen to use some high-bit code like in French, you
can even make exception. So far all Chinese and Korean (and maybe
Japanese) spams had been killed.
I wrote "SPAM ASIAN-7bit" filter but this never got triggered.
(7-bit JIS falls here. I never receive Japanese mail on this account.)
I have many other content based scripts here most of which I disabled.
High bit filter is quite effective and still in use.
I need effective NIGERIAN scam filter :)
I will update URL issues when I update CVS.
~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +++++
Osamu Aoki @ Cupertino CA USA
See "User's Guide": http://www.debian.org/doc/manuals/users-guide/
See "Debian reference": http://www.debian.org/doc/manuals/debian-reference/
"Debian reference" Project at: http://qref.sf.net
I welcome your constructive criticisms and corrections.
To UNSUBSCRIBE, email to email@example.com
with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org