[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way



Russ Allbery wrote:
[...]
> 
> Have you benchmarked this?  My intuition says that if this makes any
> difference at all, it will be a performance *degredation*.  You're now
> walking the entire text for every typo we know about instead of doing an
> O(1) hash table lookup for each word.  It's converting an O(n) check into
> an O(n^2) check.
> 

There might indeed be a degradation on those texts where the number of
characters is near the limit of the methods switcher. Other than those
cases, when the number of words in the text is greater than the number of
words in the hash there should be a performance improvement when searching
for every known word in the text (there is indeed a performance penalty
because of the regex and the text traversal, but it is not enough to make
it slower than the other method).

Without the patch:
$ time
xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb
[...]
real    0m8.380s
user    0m5.596s
sys     0m0.640s
(this is normal as the file is being read directly from the disk, not in
memory)
$ time
xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb
[...]
real    0m6.701s
user    0m5.800s
sys     0m0.684s
$ time
xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb
[...]
real    0m6.424s
user    0m5.740s
sys     0m0.520s

This time with the patch:
$ time
xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb
[...]
real    0m6.613s
user    0m5.728s
sys     0m0.580s
(Let's ignore this one, so that disk reads do not influence the results)
$ time
xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb
[...]
real    0m6.254s
user    0m5.564s
sys     0m0.540s
$ time
xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb
[...]
real    0m6.311s
user    0m5.456s
sys     0m0.600s

(xlintian is an alias to lintian from my git repo).

As you can see with the patch it is faster.

Cheers,
-- 
Raphael Geissert - Debian Maintainer
www.debian.org - get.debian.net





Reply to: