Bug#561089: ispell: segfaults on checking any file
On Mon, 14 Dec 2009, Agustin Martin wrote:
> On Mon, Dec 14, 2009 at 02:03:35PM +0100, Helmut Grohne wrote:
> > Package: ispell
> > Version: 188.8.131.52-7
> > Severity: grave
> > Justification: renders package unusable
> > When I try to spell check any file ispell simply segfaults. When I
> > invoke it without arguments it prints the help text. To find out
> > whether this was a recent regression I downgraded the package, but
> > that did not help. This indicates that the cause may be unrelated
> > to ispell. I also ran gdb on a core file from ispell, but the
> > traceback did not reveal anything useful. Running strace in ispell
> > shows that it uses curses to set up the terminal and then
> > segfaults right after reading the file to be checked.
> > Is there anything else I can do to help diagnose this?
> That seems related to this ingerman installation error
> Unpacking ingerman (from .../ingerman_20091006-2_all.deb) ...
> Setting up ingerman (20091006-2) ...
> ispell-autobuildhash: Processing 'ngerman' dict
> Hash table overflowed by 1342 words
I can reproduce the problem here. It seems to be a problem in
buildhash. Function filltable() from buildhash.c throws this error
message when I try to build the ngerman hash file:
$ gzip -dc /usr/share/ispell/ngerman.mwl.gz > /tmp/ngerman.mwl
$ buildhash -s /tmp/ngerman.mwl /usr/lib/ispell/ngerman.aff /tmp/ngerman.hash
Hash table overflowed by 1371 words
With some sorting and tweeking on ngerman.mwl I can change the number
of overflowed words but I wasn't able to reduce this to zero.
> Does this happen with other dictionaries?
I know that this didn't happen with the old 20071211 ngerman
dictionary, but the upstream maintainer of this dictionary was quite
active and introduced many changes in 20091006 version.
I didn't understand why this hash table overflow in buildhash implies
the segmentation fault of ispell, but I can reproduce the problem
here. I would expect a behavior where only some words are missing...
I just tried out whether buildhash from ispell 184.108.40.206 behaves
different, but I get the same error message there.
Do you see a chance to increase the size of the hash table to get rid
of this problem? Or do we have to modify the German dictionary to fit
into the data structure (but where should I start with skipping
words?). Removing just some thousand lines from the end of the input
file doesn't solve the problem, so maybe not the number of lines but
some special lines seem to trigger the problem. But what lines?