[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spell checker as reasonable SPAM prevention tool



On Fri, Feb 11, 2011 at 10:19:07AM +0100, Andreas Tille wrote:
> since some time we get more and more SPAM which is easily to detect for
> me (and most probably automatically):  SPAM in languages I do simply not
> understand and which are definitely not English.  Wouldn't it be a
> reasonable means for a SPAM filter to mark mails which blatantly fail a
> spell checker to mark as potential SPAM and just apply this filter to
> all Debian lists.  We have defined languages for each list and the "one
> mail per month" were a user just writes in the wrong language by
> accident will probably not harm the project.

I've been thinking about this some as well for my personal domain.
Debian has tools that can determine the language of a document
(libtextcat and friends).  Emails that are 70% or more composed of
languages that I have no hope of speaking or understanding (i.e.,
everything but English, Spanish, French, and Portuguese) would be
rejected.  I chose 70% as the threshold because sometimes Debian lists
get mails from users in both English and another language (in hopes of
being understood) and I wouldn't want to penalize those users.  I
haven't implemented this, but I might at some point.

Obviously, this would have to be adjusted per-list; we wouldn't want to
reject German-language emails to debian-user-german.  I also think
language testing is better than spell checking for English because
honestly English has a lot of pretty irregular and bizarre spellings; I
say this as someone whose native language is English and who spells
fairly decently.  A spell checker might catch more legitimate emails
than we'd like.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

Attachment: signature.asc
Description: Digital signature


Reply to: