Re: Spam in the lists out of control
Duncan Findlay schrieb:
> Pfff... you can have my 63,286 spams if you really want, but it won't
> really help you. The thing with a Bayesian database is that the mail
> it's trained on needs to be similar to the mail it will be tested
That's true for legitimate mail, but spam is very similiar
for all people. I do get spam in Chinese although I can't
read a single glyph of it.
> What is more likely an issue is that the scores are not ideally set to
> debian's needs. I have previously volunteered my assistance to run the
> "perceptron" to generate better scores for Debian; however the problem
> seems to be compiling a relatively large corpus of hand-sorted spam
> and non-spam from debian lists.
It should not be too hard for a large project to have one person
per mailing list to find hundreds of legitimate mails, which would
add up to thousands of mails. That should result in good filtering,
at least it does for me.
But as Marco d'Itri pointed out, bayesian filtering is not an option
due to CPU limitation.