[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#487874: Problem is in hunspell and not myspell-en-gb



I've now diagnosed this problem and (unless there are some stringent
rules on the contents of .aff files which I've been unable to find) the
problem lies in hunspell and not in myspell-en-gb.  It's just that the
extra complexity of myspell-en-gb tickles the bug in hunspell.

An example of a rule which hunspell fails to process correctly is:

SFX D 0 ed [aeio][aeiou][bcdfgkmnprstvz]

And a suitable word for demonstrating the problem is "entertained".

The actual processing happens in the SfxEntry::test_condition method in
the affentry.cxx compilation unit.  It tries to work backwards through
the proposed stem ("entertain") and at the same time backwards through
the above comparison rule.

When it finds the 'n' (the last letter of "entertain") in the third
group it sets a flag (called ingroup) and decrements its pointer into
the target word so that pointer is now pointing at the 'i' of
"entertain".  However it then carries on working its way through the
characters of the third group.  Fortunately there is no 'i' there to
find so the bug does no harm.

The code then moves on to the second group of characters, and works
backwards through it until it finds the 'i', which matches the letter
currently being pointed at in the target word.  It sets the flag again
and again decrements the pointer into the target word so now it points
at the 'a' in "entertain".  This time the bug bites.  Because the code
goes on processing the remaining characters in the second group it then
finds the 'a' there, which causes the pointer into the target word to be
decremented again - now it points to the latter 't' in "entertain" and
so when the code comes to process the first group of characters it can't
get a match (it wants to find 'a', but that's been and gone).

As a quick demonstration that this explanation is correct, you can edit
the en_GB.aff file and reverse the middle group of the rule so that it
reads:

SFX D 0 ed [aeio][uoiea][bcdfgkmnprstvz]

Hunspell then successfully recognises "entertained" as being a word.

Can this bug be reassigned to the hunspell source package?



Reply to: