I have experimented with Voikko, for the Finnish language Voikko does much better job at proofreading than aspell or other *spells. First I tried with enchant from the packgage libenchant1c2a, but got mixed results and there seems to be no way to give lists of OK words (the docs are sparse). Then I made my own program using pyenchant, which had ample and good documentation. This seems to work better than enchant, and lists of OK words can be used. My Python program is in d-i SVN, sorry I stored it there in an unsuitable directory, but I wanted to store it somewhere safe and accessible to others: d-i/trunk/manual/po/fi/list-unkn-words.py This same program should work for all languages that have an enchant backend, but that list is not very long now. $ cat /usr/share/enchant/enchant.ordering *:myspell,aspell,ispell fi:voikko,ispell,myspell,aspell fi_FI:voikko,ispell,myspell,aspell he:hspell,myspell he_IL:hspell,myspell yi:uspell tr:zemberek tr_TR:zemberek Seems to work for en_UK and en_US. ./list-unkn-words.py -d en_UK -l ../../en/hardware/hardware.xml 1 fjp 2 para 1 req 5 xml Next I try to figure out the spellcheck scripts that are now used in Debian Installer spellchecking. I think I somewhat understood the workflow, but have not yet any good ideas on how to change the proofreading for Finnish. PS I made a shell script make-fi-all.sh to create the fi_all.po that list-unkn-words.py gets. I stripped away the "-characters at beginning and end of line, and the superfluous lines at the beginning of the file that extract_msg.pl makes. -- Tapio Lehtonen tapio.lehtonen@iki.fi http://www.iki.fi/tapio.lehtonen
Attachment:
signature.asc
Description: Digital signature