[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Proofreading Finnish text



I have experimented with Voikko, for the Finnish language Voikko does
much better job at proofreading than aspell or other *spells. First I
tried with enchant from the packgage libenchant1c2a, but got mixed
results and there seems to be no way to give lists of OK words (the
docs are sparse).

Then I made my own program using pyenchant, which had ample and good
documentation. This seems to work better than enchant, and lists of OK
words can be used. 

My Python program is in d-i SVN, sorry I stored it there in an
unsuitable directory, but I wanted to store it somewhere safe and
accessible to others:
d-i/trunk/manual/po/fi/list-unkn-words.py

This same program should work for all languages that have an enchant
backend, but that list is not very long now.
$ cat /usr/share/enchant/enchant.ordering
*:myspell,aspell,ispell
fi:voikko,ispell,myspell,aspell
fi_FI:voikko,ispell,myspell,aspell
he:hspell,myspell
he_IL:hspell,myspell
yi:uspell
tr:zemberek
tr_TR:zemberek

Seems to work for en_UK and en_US.

./list-unkn-words.py -d en_UK -l ../../en/hardware/hardware.xml 
     1   fjp
     2   para
     1   req
     5   xml

Next I try to figure out the spellcheck scripts that are now used in
Debian Installer spellchecking. I think I somewhat understood the
workflow, but have not yet any good ideas on how to change the
proofreading for Finnish. 

PS I made a shell script make-fi-all.sh to create the fi_all.po that
   list-unkn-words.py gets. I stripped away the "-characters at
   beginning and end of line, and the superfluous lines at the 
   beginning of the file that extract_msg.pl makes.

-- 
Tapio Lehtonen
tapio.lehtonen@iki.fi
http://www.iki.fi/tapio.lehtonen

Attachment: signature.asc
Description: Digital signature


Reply to: