[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spam (ex:Re: Unicode: is it safe to use it ?)



on Sun, Sep 07, 2003 at 10:20:38AM +0200, Christophe Courtois (christophe@courtois.cc) wrote:
> Le Dimanche 7 Septembre 2003 06:19, Karsten M. Self a d?clam? :
> > Well, another risk is people who use a fairly popular set of filters
> > which tag as spam anything that's more than a few percent (my own
> > threshold is 10%) non-roman characters, or specified in any of the
> > following charsets:
> 
>  It depends with who you communicate.
> 
>  When I see a mail from an anglo-saxon name, with a subject in English, 
> and not already in a mailing-list folder, I'm 99% sure that it is a spam. 
> This filter would be far more useful for me :-)

That's something you might be able to apply Bayesian training to.

The elegance of the charset filters is that it's trivial to apply a
filter based on a percentage of content being in an unreadable face.

Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    By failing to protect the public interest in free access to the
    products of the inventive and artistic genius -- indeed, by
    virtually ignoring the central purpose of the Copyright/Patent
    Clause [in the Constitution] -- the Court has quitclaimed to
    Congress its principal responsibility in this area of the law."
    -- Justice Stevens, J., dissenting, "Eldred v. Ashcroft"

Attachment: pgpmWn6OG28UU.pgp
Description: PGP signature


Reply to: