[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFR] templates://crm114/{crm114.templates}



Milan Zamazal wrote:
> Thank you, Christian, for your suggestions.  I agree with your proposed
> changes with the following exceptions:
> 
>>>>>> "CP" == Christian Perrier <bubulle@debian.org> writes:
> 
>     CP> +Description: versatile filtering system for email and other
>     CP> data
> 
> crm114 is not a filtering system, it's a classifying system.

Fair enough; so "versatile classifying system for e-mail and other
data", or possibly "versatile classifier for e-mail and other data"?

>     CP> - Accuracy of the SBPH/BCR classifier has been seen in excess of 99 per cent,
>     CP> - for 1/4 megabyte of learning text. In other words, CRM114 learns, and it
>     CP> - learns fast.
> 
>     CP> The last sentences are a little bit too close to "advertisement" as
>     CP> discouraged by the DevRef. Neutral language is the key, here.
> 
> I agree the wording could be improved.  But the information that crm114
> is accurate and that it learns fast is true and it is important for the
> user (this is the most important reason I use crm114 and not another
> classifier after all), so it shouldn't be removed.

Surely nobody would set out to choose an inaccurate, slow-learning
spamfilter, but popcon tells me crm114 has only a few dozen active
users, compared to thousands using bogofilter.  I see papers online
comparing the various algorithms used; there's even a page written
in 2002 by a bogofilter developer trying out a version that adopts
the CRM114 algorithm... but this didn't go anywhere.  Why?  Does
CRM114 have disadvantages - like being harder to set up, slower
processing individual messages, more resource-heavy?  Or is it
unjustly overlooked? 
-- 
JBR	with qualifications in linguistics, experience as a Debian
	sysadmin, and probably no clue about this particular package


Reply to: