Bug#1019235: lintian: 'licence' is not a misspelling
Control: clone -1 -2
Control: tag -1 + confirmed pending
Control: retitle -2 lintian: New spelling corrections should be automatically checked against an american and a british english dictionary
Control: severity -2 wishlist
Hi Andreas,
Andreas Beckmann wrote:
> 'licence' is a valid (mostly british) variant of license
Yep, noticed this as well before I saw your bug report. Already fixed in
https://salsa.debian.org/lintian/lintian/-/commit/7d801b2c9c88683051afe0937b46f065cb8873a2
> Perhaps (new) spelling corrections should be automatically checked
> against an american and a british english dictionary and carefully
> reconsidered if they are found?
Good idea! Cloning the bug report for that accordingly as this is a
separate thing.
Still don't have an idea how to actually do that, but I guess it will
be part of the test suite, not a commit hook.
> Without implying to delete all the matches (I haven't heard most of the
> matching words and would need to look up their meaning...):
>
> $ grep -v ^# /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1 | while read word ; do grep "^$word\$" /usr/share/dict/american-english /usr/share/dict/british-english ; done
Thanks for figuring out this nice little command! I though will try to
optimize it to not call grep for each word but use something like:
grep -Fw -f <(grep -v '^#' /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1) /usr/share/dict/american-english /usr/share/dict/british-english
I now wonder if we should use wamerican/wbritish or
wamerican-insane/wbritish-insane for that. Maybe wamerican/wbritish is
a good start and if we still get too many false posiives, we can
extend it to use wamerican-insane/wbritish-insane. (The latter will
probably also take longer. But then again with my optimized query
above it also just takes less than a second on a 7 year old laptop.
And it yields about 350 hits.)
Some comments about some of those you found:
> /usr/share/dict/american-english:bellow
> /usr/share/dict/british-english:bellow
> /usr/share/dict/american-english:singed
> /usr/share/dict/british-english:singed
Would keep these. The chances that it is a misspelling of "below" or
"signed" are IMHO much higher than the chance that it is used in
Debian in its actual meaning.
So in case we write a test for this, we should probably list
exceptions we want to keep in that test.
> /usr/share/dict/american-english:convertor
> /usr/share/dict/british-english:convertor
> /usr/share/dict/american-english:dependance
> /usr/share/dict/american-english:dependant
> /usr/share/dict/british-english:dependant
> /usr/share/dict/american-english:extravert
> /usr/share/dict/british-english:extravert
> /usr/share/dict/american-english:extraverts
> /usr/share/dict/british-english:extraverts
> /usr/share/dict/american-english:licence
> /usr/share/dict/british-english:licence
> /usr/share/dict/american-english:miniscule
> /usr/share/dict/british-english:miniscule
> /usr/share/dict/american-english:venders
> /usr/share/dict/american-english:vender
> /usr/share/dict/american-english:want's
> /usr/share/dict/british-english:want's
These should probably be removed. They all look like alternative
spellings, either historic or local.
Not sure about the remaining ones.
Regards, Axel
--
,''`. | Axel Beckert <abe@debian.org>, https://people.debian.org/~abe/
: :' : | Debian Developer, ftp.ch.debian.org Admin
`. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5
`- | 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Reply to: