[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: demographics of debian users (was: ratio of male vs. female debian users)



On Wed, Jul 17, 2002 at 11:25:53PM -0700, Osamu Aoki wrote:
> On Thu, Jul 18, 2002 at 12:58:09AM +0100, Colin Watson wrote:
> > I prefer filtering mail based on the character set, which I think is
> > much more reliable. After all, even if somebody did send me an e-mail in
> > Korean, I'm not going to be able to read it.
> > 
> > SpamAssassin takes care of enough of the rest that whatever does slip
> > through doesn't bother me too much, and I filter rather than bounce so
> > that I can deal with the very occasional false positive. That said,
> > these two rules haven't had any false positives for me yet:
> > 
> >   # TODO: If this works, /dev/null?
> >   :0:
> >   * ^Content-Type: .*charset="?ks_c_5601-1987
> >   spam
> 
> I do not think this is right thing to do.  This is as bad as filtering
> by domain name etc.  (I understand that this prevents many spams with
> minimum CPU cycle.)

Osamu,

So far I merely filter into a separate mailbox based on this, not delete
it by default, so if a false positive does turn up then I'll see it in a
day or two anyway. So far, in the month or two since I've been using
this, I have not had a single false positive: nobody has ever sent me a
mail in ASCII labelled as ks_c_5601_1987. (I'm very paranoid about
losing valid mail, which is why I'm very conservative about
/dev/null-ing things.)

Naturally somebody who corresponds with Korean Windows users in the
habit of sending ASCII mail labelled as what I understand is an obsolete
and deprecated Korean character set will not be interested in this rule.
However, for me this is much less of an elephant-gun approach than
filtering by domain name - it won't filter e.g. Korean developers - and,
as you say, is efficient in terms of CPU cycles. (Before I moved to
spamc, SpamAssassin once sent the load on my box to 30 or above when I
uploaded a package that closed 40 bugs, and made the machine unusable
for interactive use for about half an hour!)

> >   # Apparently this is a legally-required Korean tag meaning "hello, I'm
> >   # spam." For once, I'm going to believe the spammers.
> >   :0:
> >   * ^Subject: (.?????|.*????.?$)
> >   spam
> 
> I use similar high bit filter mechanism and it captures all Korean Spams.
> 
>    http://www3.sympatico.ca/walter.dnes/email/chinese/

That link returns a 404.

Cheers,

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]


-- 
To UNSUBSCRIBE, email to debian-user-request@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: