[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: demographics of debian users (was: ratio of male vs. female debian users)

On Thu, Jul 18, 2002 at 10:17:53AM +0100, Colin Watson wrote:
> On Wed, Jul 17, 2002 at 11:25:53PM -0700, Osamu Aoki wrote:
> > On Thu, Jul 18, 2002 at 12:58:09AM +0100, Colin Watson wrote:
> > > I prefer filtering mail based on the character set, which I think is
> > > much more reliable. After all, even if somebody did send me an e-mail in
> > > Korean, I'm not going to be able to read it.
> > > 
> > > SpamAssassin takes care of enough of the rest that whatever does slip
> > > through doesn't bother me too much, and I filter rather than bounce so
> > > that I can deal with the very occasional false positive. That said,
> > > these two rules haven't had any false positives for me yet:
> > > 
> > >   # TODO: If this works, /dev/null?
> > >   :0:
> > >   * ^Content-Type: .*charset="?ks_c_5601-1987
> > >   spam
> > 
> > I do not think this is right thing to do.  This is as bad as filtering
> > by domain name etc.  (I understand that this prevents many spams with
> > minimum CPU cycle.)
> Osamu,
> So far I merely filter into a separate mailbox based on this, not delete
> it by default, so if a false positive does turn up then I'll see it in a
> day or two anyway. So far, in the month or two since I've been using
> this, I have not had a single false positive: nobody has ever sent me a
> mail in ASCII labelled as ks_c_5601_1987. (I'm very paranoid about
> losing valid mail, which is why I'm very conservative about
> /dev/null-ing things.)

I hear you and believe you.  (I tend to do /dev/nul for some bad ones)

> Naturally somebody who corresponds with Korean Windows users in the
> habit of sending ASCII mail labelled as what I understand is an obsolete
> and deprecated Korean character set will not be interested in this rule.
> However, for me this is much less of an elephant-gun approach than
> filtering by domain name - it won't filter e.g. Korean developers - and,

That is a good point.

> as you say, is efficient in terms of CPU cycles. (Before I moved to
> spamc, SpamAssassin once sent the load on my box to 30 or above when I
> uploaded a package that closed 40 bugs, and made the machine unusable
> for interactive use for about half an hour!)

Yeah, I know it takes much CPU time if I scan mail contents.  That is
the reason I stopped much of content filtering.

> > >   # Apparently this is a legally-required Korean tag meaning "hello, I'm
> > >   # spam." For once, I'm going to believe the spammers.
> > >   :0:
> > >   * ^Subject: (.?????|.*????.?$)
> > >   spam
> > 
> > I use similar high bit filter mechanism and it captures all Korean Spams.
> > 
> >    http://www3.sympatico.ca/walter.dnes/email/chinese/
> That link returns a 404.

Oops.  URL is unreachable now.

I have its modified version here (this is procmailrc):


 "SPAM ASIAN-8bit" is the filter you wants.  Concept is very simple and
 versatile.  If you happen to use some high-bit code like in French, you
 can even make exception.  So far all Chinese and Korean (and maybe
 Japanese) spams had been killed.

 I wrote "SPAM ASIAN-7bit" filter but this never got triggered.
 (7-bit JIS falls here.  I never receive Japanese mail on this account.)

 I have many other content based scripts here most of which I disabled.
 High bit filter is quite effective and still in use.

I need effective NIGERIAN scam filter :) 

I will update URL issues when I update CVS.

~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +++++
 Osamu Aoki @ Cupertino CA USA
 See "User's Guide":     http://www.debian.org/doc/manuals/users-guide/
 See "Debian reference": http://www.debian.org/doc/manuals/debian-reference/
 "Debian reference" Project at: http://qref.sf.net

 I welcome your constructive criticisms and corrections.

To UNSUBSCRIBE, email to debian-user-request@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: