Bug#1019541: lintian: New spelling corrections should be automatically checked against an american and a british english dictionary

To: Andreas Beckmann <anbe@debian.org>, 1019541@bugs.debian.org
Subject: Bug#1019541: lintian: New spelling corrections should be automatically checked against an american and a british english dictionary
From: Axel Beckert <abe@debian.org>
Date: Sun, 11 Sep 2022 16:36:19 +0200
Message-id: <[🔎] 20220911143619.elc2ve2gj5j3t2bw@sym.noone.org>
Reply-to: Axel Beckert <abe@debian.org>, 1019541@bugs.debian.org
In-reply-to: <[🔎] 20220911135437.vgbloviouece7ryo@sym.noone.org>
References: <[🔎] 166242398538.11870.16498883358845040240.reportbug@zam504.zam.kfa-juelich.de> <[🔎] 166242398538.11870.16498883358845040240.reportbug@zam504.zam.kfa-juelich.de> <[🔎] 20220911135437.vgbloviouece7ryo@sym.noone.org> <[🔎] 166242398538.11870.16498883358845040240.reportbug@zam504.zam.kfa-juelich.de>

Hi again,

Axel Beckert wrote:
> > $ grep -v ^# /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1 | while read word ; do grep "^$word\$" /usr/share/dict/american-english /usr/share/dict/british-english ; done
> 
> Thanks for figuring out this nice little command! I though will try to
> optimize it to not call grep for each word but use something like:
> 
>   grep -Fw -f <(grep -v '^#' /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1) /usr/share/dict/american-english /usr/share/dict/british-english

In the end this probably will be implemented in Perl instead as there
are similar checks in t/scripts/spellintian.t already.

> I now wonder if we should use wamerican/wbritish or
> wamerican-insane/wbritish-insane for that. Maybe wamerican/wbritish is
> a good start and if we still get too many false posiives, we can
> extend it to use wamerican-insane/wbritish-insane. (The latter will
> probably also take longer. But then again with my optimized query
> above it also just takes less than a second on a 7 year old laptop.
> And it yields about 350 hits.)

Some more points on this question:

t/scripts/spellintian.t already has (only) two checks for seldom, but valid
words so that they don't get added again, namely "iff" and
"publically".

Both these words are not in /usr/share/dict/*-english but in
/usr/share/dict/*-english-insane.

		Regards, Axel
-- 
 ,''`.  |  Axel Beckert <abe@debian.org>, https://people.debian.org/~abe/
: :' :  |  Debian Developer, ftp.ch.debian.org Admin
`. `'   |  4096R: 2517 B724 C5F6 CA99 5329  6E61 2FF9 CD59 6126 16B5
  `-    |  1024D: F067 EA27 26B9 C3FC 1486  202E C09E 1D89 9593 0EDE

Reply to:

References:
- Bug#1019235: lintian: 'licence' is not a misspelling
  - From: Andreas Beckmann <anbe@debian.org>
- Bug#1019235: lintian: 'licence' is not a misspelling
  - From: Axel Beckert <abe@debian.org>

Prev by Date: Bug#1019540: lintian: Multiline dependency in Build-Depends leads to bad-relation & invalid-profile-name-in-source-relation
Next by Date: Bug#1019540: lintian: Multiline dependency in Build-Depends leads to bad-relation & invalid-profile-name-in-source-relation
Previous by thread: Bug#1019235: lintian: 'licence' is not a misspelling
Next by thread: Processed: Re: Bug#1019235: lintian: 'licence' is not a misspelling
Index(es):
- Date
- Thread