Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way

To: 514495@bugs.debian.org
Subject: Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way
From: Russ Allbery <rra@debian.org>
Date: Sat, 07 Feb 2009 17:25:54 -0800
Message-id: <[🔎] 873aep90t9.fsf@windlord.stanford.edu>
Reply-to: Russ Allbery <rra@debian.org>, 514495@bugs.debian.org
In-reply-to: <[🔎] 200902071903.09664.atomo64@gmail.com> (Raphael Geissert's message of "Sat\, 7 Feb 2009 19\:03\:09 -0600")
References: <[🔎] 200902071903.09664.atomo64@gmail.com>

Raphael Geissert <atomo64@gmail.com> writes:

> Commit message explains it:
>>     When spell checking large texts determine what's more convenient: to
>> look every word in the text for spelling mistakes or to look for known
>> spelling mistakes in the text.
>>
>>     This should speed up checking large texts, with the only, minor,
>> consecuence being that only the first match of a spelling mistake is found
>> and warned about; but since the line numbers are not printed it is not big
>> deal.
>>
>>     Additionally move some regular expressions and other operations so that
>> they are performed once for all the text, instead of doing it once on every
>> word.

Have you benchmarked this?  My intuition says that if this makes any
difference at all, it will be a performance *degredation*.  You're now
walking the entire text for every typo we know about instead of doing an
O(1) hash table lookup for each word.  It's converting an O(n) check into
an O(n^2) check.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Reply to:

References:
- Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way
  - From: Raphael Geissert <atomo64@gmail.com>

Prev by Date: Processed: tagging 514495
Next by Date: [SCM] Debian package checker branch, master, updated. 2.2.2-29-g43ed8ff
Previous by thread: Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way
Next by thread: Bug#514495: [lib/Spelling.pm] check the spelling of large texts in a more efficient way
Index(es):
- Date
- Thread