[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

How to limit the number of "unknown words" detected by D-I l10n spellchecker



Quoting Tirumurti Vasudevan (agnihot3@gmail.com):

(this was a private mail but I think the answer deserves being sent to
-i18n as well...and there was nothing private in it, indeed)

Background: there is a spellchecker running on D-I PO files. Its
results are shown at http://d-i.debian.org/l10n-spellcheck/. Ideally,
languages should have 0 "unknown words" and other errors
detected. Practically, only French is in that case..:-)

Translators who are subscribed to the "websec" automated notifications
get a notification when the number of unknown words changes.

> is there anything i can do about the unknown words?


You can define exceptions. The script that generates spellchecking
stats is l10n-spellcheck. It is kept in D-I SVN as of
svn+ssh://svn.debian.org/svn/d-i/trunk/scripts/l10n/l10n-spellcheck

It has "configuration" files that are mostly lists of words that are
not listed in aspell-<language> word list but the translator knows are
valid.

There is one list per level of D-I (so that makes 3 lists).

For instance, level 1 lists are kept in
svn+ssh://svn.debian.org/svn/d-i/trunk/scripts/l10n/l10n-spellcheck/cfg/level1/wls

You can have a look at exceptions I manage in the fr.txt file.

Of course, the first round is hard because there are *many* "words"
that are unknown to spellcheckers and word lists. So you need to go
through the detected "errors" and decide, one by one, if this is
something that's legitimate....or maybe a typo of yours!

It should take a couple of hours (well, a couple that's probably over 2....)
Once done, it becomes much easier to maintain.





Reply to: