[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spam cleaning in list archives: one last effort needed



Quoting Frans Pop (elendil@planet.nl):

> As you can see we have a _very_ high ratio, so our scanning process has 
> been very effective. Other lists have had much more ham reported, but that 
> does show the importance of the review stage.

One should note that we now have 3 people (Sandro Tosi, Luca Falavigna
and /me) who have reviewed all nominated posts in all lists.

So, assuming that each of us takes care to review newly nominated
posts each week, we three can manage to review nominated posts as they
arrive with "minimal" effort (it takes me about 15 minutes a week).

Given this, and the fact that we nearly completed the review of -boot,
I began nominating posts in another list (-l10n-french) along with its
contributors. And, of course, the same process can be repeated for
<insert your pet list here>.

> debian-user-german 1744 (10%)
> debian-user-spanish 1024 (13%)
> debian-chinese-gb 470 (33%)
> debian-user-portuguese 424 (30%)

Most non-English lists have a low ratio because they had a *lot* of
false positives. I assume that nominated posts there were "nominated"
by automated processes (maybe some CRM114, or similar...which often
gives false positives on non-English lists).

> debian-apache 412 (16%)
> cdwrite 296 (27%)


These two lists have very recurrent subjects, which seems to increase
the probability of false positives. For instance, most Joerg Schilling
posts we actually nominated in cdwrite..:-)


Again, thanks Frans for starting that interesting process. That was a
brilliant idea.


Attachment: signature.asc
Description: Digital signature


Reply to: