[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spell checker as reasonable SPAM prevention tool



On Fri, Feb 11, 2011 at 10:42:49AM +0100, Samuel Thibault wrote:
> Andreas Tille, le Fri 11 Feb 2011 10:19:07 +0100, a écrit :
> > PS: I assume that a spell checker can be configured that way that it
> >     can distinguish between writing an English text with some / several
> >     mistakes and a text with say 50% error rate which is probably not
> >     understandable anyway.
> 
> Mmm, I think we've already had users that have even 50% error rate,
> simply because they mispell things. Yes, not everybody has even a basic
> knowledge level in english, but they still can provide useful input to a
> mailing list.

It might be a topic of fuerther investigation what limit on the error
rate to put but I'm quite positive that there are reasonable algorithms
to detect in what language a text is in or rather to detect whether a
text atempts to be written in a certain language (which is probably
easier than to guess a language).  The question whether it is worth
doing some stats on the mailing list archive about this is rather if we
finally want this language detection method for a SPAM filter or not.

My guess is that you will find a ratio of misspelled words / total
number of words which is a clear sign for non-English text, than you
have some intermediate area where those postings like you are afraid
about are belonging to and than there are the postings which are
obviosely trying hard to write some English.  I'd like to get rid of
the clearly non-English texts.  I have the impression that we get more
and more of these since some time and I assume that bayesian filters
are not (yet) trained good enough to detect these as SPAM.  So we need
to find some other means.

Kind regards

       Andreas.

-- 
http://fam-tille.de


Reply to: