Re: demographics of debian users (was: ratio of male vs. female debian users)

To: debian-user@lists.debian.org
Subject: Re: demographics of debian users (was: ratio of male vs. female debian users)
From: Osamu Aoki <debian@aokiconsulting.com>
Date: Thu, 18 Jul 2002 22:16:59 -0700
Message-id: <[🔎] 20020719051659.GB29610@aokiconsulting.com>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20020718091752.GA1731@riva.ucam.org>
References: <[🔎] 873cuk8tiw.fsf@jidanni.org> <[🔎] 20020716172952.GD7500@silk.kitenet.net> <[🔎] E17UWXg-0003JX-00@benzone.com> <[🔎] 200207161839.OAA16807@dewberry.cc.columbia.edu> <[🔎] 20020717025543.GA2437@silk.kitenet.net> <[🔎] 87n0sqnev5.fsf@jidanni.org> <[🔎] 20020717235809.GA28038@riva.ucam.org> <[🔎] 20020718062553.GA24307@aokiconsulting.com> <[🔎] 20020718091752.GA1731@riva.ucam.org>

Hi,
On Thu, Jul 18, 2002 at 10:17:53AM +0100, Colin Watson wrote:
> On Wed, Jul 17, 2002 at 11:25:53PM -0700, Osamu Aoki wrote:
> > On Thu, Jul 18, 2002 at 12:58:09AM +0100, Colin Watson wrote:
> > > I prefer filtering mail based on the character set, which I think is
> > > much more reliable. After all, even if somebody did send me an e-mail in
> > > Korean, I'm not going to be able to read it.
> > > 
> > > SpamAssassin takes care of enough of the rest that whatever does slip
> > > through doesn't bother me too much, and I filter rather than bounce so
> > > that I can deal with the very occasional false positive. That said,
> > > these two rules haven't had any false positives for me yet:
> > > 
> > >   # TODO: If this works, /dev/null?
> > >   :0:
> > >   * ^Content-Type: .*charset="?ks_c_5601-1987
> > >   spam
> > 
> > I do not think this is right thing to do.  This is as bad as filtering
> > by domain name etc.  (I understand that this prevents many spams with
> > minimum CPU cycle.)
> 
> Osamu,
> 
> So far I merely filter into a separate mailbox based on this, not delete
> it by default, so if a false positive does turn up then I'll see it in a
> day or two anyway. So far, in the month or two since I've been using
> this, I have not had a single false positive: nobody has ever sent me a
> mail in ASCII labelled as ks_c_5601_1987. (I'm very paranoid about
> losing valid mail, which is why I'm very conservative about
> /dev/null-ing things.)

I hear you and believe you.  (I tend to do /dev/nul for some bad ones)

> Naturally somebody who corresponds with Korean Windows users in the
> habit of sending ASCII mail labelled as what I understand is an obsolete
> and deprecated Korean character set will not be interested in this rule.
> However, for me this is much less of an elephant-gun approach than
> filtering by domain name - it won't filter e.g. Korean developers - and,

That is a good point.

> as you say, is efficient in terms of CPU cycles. (Before I moved to
> spamc, SpamAssassin once sent the load on my box to 30 or above when I
> uploaded a package that closed 40 bugs, and made the machine unusable
> for interactive use for about half an hour!)

Yeah, I know it takes much CPU time if I scan mail contents.  That is
the reason I stopped much of content filtering.

> > >   # Apparently this is a legally-required Korean tag meaning "hello, I'm
> > >   # spam." For once, I'm going to believe the spammers.
> > >   :0:
> > >   * ^Subject: (.?????|.*????.?$)
> > >   spam
> > 
> > I use similar high bit filter mechanism and it captures all Korean Spams.
> > 
> >    http://www3.sympatico.ca/walter.dnes/email/chinese/
> 
> That link returns a 404.

Oops.  URL is unreachable now.

I have its modified version here (this is procmailrc):

 http://www.debian.org/doc/manuals/debian-reference/examples/_procmailrc

 "SPAM ASIAN-8bit" is the filter you wants.  Concept is very simple and
 versatile.  If you happen to use some high-bit code like in French, you
 can even make exception.  So far all Chinese and Korean (and maybe
 Japanese) spams had been killed.

 I wrote "SPAM ASIAN-7bit" filter but this never got triggered.
 (7-bit JIS falls here.  I never receive Japanese mail on this account.)

 I have many other content based scripts here most of which I disabled.
 High bit filter is quite effective and still in use.

I need effective NIGERIAN scam filter :) 

I will update URL issues when I update CVS.

-- 
~\^o^/~~~ ~\^.^/~~~ ~\^*^/~~~ ~\^_^/~~~ ~\^+^/~~~ ~\^:^/~~~ ~\^v^/~~~ +++++
 Osamu Aoki @ Cupertino CA USA
 See "User's Guide":     http://www.debian.org/doc/manuals/users-guide/
 See "Debian reference": http://www.debian.org/doc/manuals/debian-reference/
 "Debian reference" Project at: http://qref.sf.net

 I welcome your constructive criticisms and corrections.


-- 
To UNSUBSCRIBE, email to debian-user-request@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to:

References:
- ratio of male vs. female debian users
  - From: Dan Jacobson <jidanni@dman.ddts.net>
- Re: ratio of male vs. female debian users
  - From: Joey Hess <joeyh@debian.org>
- Re: ratio of male vs. female debian users
  - From: ben <benfoley@rcn.com>
- demographics of debian users (was: ratio of male vs. female debian users)
  - From: Oleg <oleg@tw304h3.cpmc.columbia.edu>
- Re: demographics of debian users (was: ratio of male vs. female debian users)
  - From: Joey Hess <joeyh@debian.org>
- Re: demographics of debian users (was: ratio of male vs. female debian users)
  - From: Dan Jacobson <jidanni@dman.ddts.net>
- Re: demographics of debian users (was: ratio of male vs. female debian users)
  - From: Colin Watson <cjwatson@debian.org>
- Re: demographics of debian users (was: ratio of male vs. female debian users)
  - From: Osamu Aoki <debian@aokiconsulting.com>
- Re: demographics of debian users (was: ratio of male vs. female debian users)
  - From: Colin Watson <cjwatson@debian.org>

Prev by Date: Re: Need Reasons for switching to Debian from Redhat
Next by Date: mount a floppy during install?
Previous by thread: Re: demographics of debian users (was: ratio of male vs. female debian users)
Next by thread: Spamhandling (Was: Re: demographics of debian users (was: ratio of male vs. female debian users))
Index(es):
- Date
- Thread