on Sun, Sep 07, 2003 at 10:20:38AM +0200, Christophe Courtois (christophe@courtois.cc) wrote:
> Le Dimanche 7 Septembre 2003 06:19, Karsten M. Self a d?clam? :
> > Well, another risk is people who use a fairly popular set of filters
> > which tag as spam anything that's more than a few percent (my own
> > threshold is 10%) non-roman characters, or specified in any of the
> > following charsets:
>
> It depends with who you communicate.
>
> When I see a mail from an anglo-saxon name, with a subject in English,
> and not already in a mailing-list folder, I'm 99% sure that it is a spam.
> This filter would be far more useful for me :-)
That's something you might be able to apply Bayesian training to.
The elegance of the charset filters is that it's trivial to apply a
filter based on a percentage of content being in an unreadable face.
Peace.
--
Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/
What Part of "Gestalt" don't you understand?
By failing to protect the public interest in free access to the
products of the inventive and artistic genius -- indeed, by
virtually ignoring the central purpose of the Copyright/Patent
Clause [in the Constitution] -- the Court has quitclaimed to
Congress its principal responsibility in this area of the law."
-- Justice Stevens, J., dissenting, "Eldred v. Ashcroft"
Attachment:
pgpmWn6OG28UU.pgp
Description: PGP signature