Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way
- To: 514495@bugs.debian.org
- Subject: Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way
- From: Russ Allbery <rra@debian.org>
- Date: Sun, 08 Mar 2009 18:59:14 -0700
- Message-id: <[🔎] 87zlfvpibx.fsf@windlord.stanford.edu>
- Reply-to: Russ Allbery <rra@debian.org>, 514495@bugs.debian.org
- In-reply-to: <200902242339.28139.atomo64@gmail.com> (Raphael Geissert's message of "Tue\, 24 Feb 2009 23\:39\:27 -0600")
- References: <200902071903.09664.atomo64@gmail.com> <498e7d1d.030bca0a.6b6f.496b@mx.google.com> <871vu81ezr.fsf@windlord.stanford.edu> <200902242339.28139.atomo64@gmail.com>
Raphael Geissert <atomo64@gmail.com> writes:
> Anyway, I have written several different implementations; one similar to
> the one I previously wrote but turning the whole list of known bad words
> into a big ORed regex and, as expected, resulted a lot faster than my
> first one. But the vast majority of times it was still slower than the
> current algorithm.
>
> These are the benchmark results of several methods, all dropping the
> regex that strips most non-word characters.
>
> On the output of strings /usr/bin/php5 (50 times):
> Rate bts orig newfg
> bts 7.74/s -- -44% -61%
> orig 13.7/s 77% -- -30%
> newg 19.7/s 154% 43% --
>
> on /usr/share/common-licenses/GPL-3 (1000 times):
> Rate bts orig new
> bts 58.6/s -- -60% -76%
> orig 146/s 148% -- -40%
> new 242/s 312% 66% --
>
> bts: the one I first submitted on this bug report
> orig: the current one
> new: the proposed one
>
> The idea behind removing the regex that removes all non-alphabetic
> characters is that the likelyhood for the resulting "word" to be an
> actual match should be extremely remote. Instead, the replacement takes
> care of removing dots, commas, and other symbols that are commonly used
> in sentences.
Yeah, this looks much better. Applied with one change: keeping hyphens to
match the behavior of the previous code.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: