How to limit the number of "unknown words" detected by D-I l10n spellchecker
Quoting Tirumurti Vasudevan (agnihot3@gmail.com):
(this was a private mail but I think the answer deserves being sent to
-i18n as well...and there was nothing private in it, indeed)
Background: there is a spellchecker running on D-I PO files. Its
results are shown at http://d-i.debian.org/l10n-spellcheck/. Ideally,
languages should have 0 "unknown words" and other errors
detected. Practically, only French is in that case..:-)
Translators who are subscribed to the "websec" automated notifications
get a notification when the number of unknown words changes.
> is there anything i can do about the unknown words?
You can define exceptions. The script that generates spellchecking
stats is l10n-spellcheck. It is kept in D-I SVN as of
svn+ssh://svn.debian.org/svn/d-i/trunk/scripts/l10n/l10n-spellcheck
It has "configuration" files that are mostly lists of words that are
not listed in aspell-<language> word list but the translator knows are
valid.
There is one list per level of D-I (so that makes 3 lists).
For instance, level 1 lists are kept in
svn+ssh://svn.debian.org/svn/d-i/trunk/scripts/l10n/l10n-spellcheck/cfg/level1/wls
You can have a look at exceptions I manage in the fr.txt file.
Of course, the first round is hard because there are *many* "words"
that are unknown to spellcheckers and word lists. So you need to go
through the detected "errors" and decide, one by one, if this is
something that's legitimate....or maybe a typo of yours!
It should take a couple of hours (well, a couple that's probably over 2....)
Once done, it becomes much easier to maintain.
Reply to: