How to limit the number of "unknown words" detected by D-I l10n spellchecker

To: Tirumurti Vasudevan <agnihot3@gmail.com>, debian-i18n@lists.debian.org
Subject: How to limit the number of "unknown words" detected by D-I l10n spellchecker
From: Christian PERRIER <christian@perrier.eu.org>
Date: Mon, 19 Apr 2010 07:45:44 +0200
Message-id: <[🔎] 20100419054544.GG28250@mykerinos.kheops.frmug.org>
In-reply-to: <n2xc379de981004182047j9af95ce5m2e5453cbf1e1c684@mail.gmail.com>
References: <20100216055507.GA29459@ravel.debian.org> <n2xc379de981004182047j9af95ce5m2e5453cbf1e1c684@mail.gmail.com>

Quoting Tirumurti Vasudevan (agnihot3@gmail.com):

(this was a private mail but I think the answer deserves being sent to
-i18n as well...and there was nothing private in it, indeed)

Background: there is a spellchecker running on D-I PO files. Its
results are shown at http://d-i.debian.org/l10n-spellcheck/. Ideally,
languages should have 0 "unknown words" and other errors
detected. Practically, only French is in that case..:-)

Translators who are subscribed to the "websec" automated notifications
get a notification when the number of unknown words changes.

> is there anything i can do about the unknown words?


You can define exceptions. The script that generates spellchecking
stats is l10n-spellcheck. It is kept in D-I SVN as of
svn+ssh://svn.debian.org/svn/d-i/trunk/scripts/l10n/l10n-spellcheck

It has "configuration" files that are mostly lists of words that are
not listed in aspell-<language> word list but the translator knows are
valid.

There is one list per level of D-I (so that makes 3 lists).

For instance, level 1 lists are kept in
svn+ssh://svn.debian.org/svn/d-i/trunk/scripts/l10n/l10n-spellcheck/cfg/level1/wls

You can have a look at exceptions I manage in the fr.txt file.

Of course, the first round is hard because there are *many* "words"
that are unknown to spellcheckers and word lists. So you need to go
through the detected "errors" and decide, one by one, if this is
something that's legitimate....or maybe a typo of yours!

It should take a couple of hours (well, a couple that's probably over 2....)
Once done, it becomes much easier to maintain.

Reply to:

Follow-Ups:
- Re: How to limit the number of "unknown words" detected by D-I l10n spellchecker
  - From: Holger Wansing <linux@wansing-online.de>

Prev by Date: Announce of the upcoming NMU for the ax25-apps package
Next by Date: Re: Intent to NMU console-log to fix pending po-debconf l10n bugs (or drop debconf)
Previous by thread: Re: Announce of the upcoming NMU for the ax25-apps package
Next by thread: Re: How to limit the number of "unknown words" detected by D-I l10n spellchecker
Index(es):
- Date
- Thread