Bug#1013946: lintian: wrongly report unknown-locale-code ber

To: Axel Beckert <abe@debian.org>, Russ Allbery <rra@debian.org>, 1013946@bugs.debian.org
Cc: Fabio Fantoni <fantonifabio@tiscali.it>
Subject: Bug#1013946: lintian: wrongly report unknown-locale-code ber
From: "Dr. Tobias Quathamer" <toddy@debian.org>
Date: Wed, 29 Jun 2022 21:16:21 +0200
Message-id: <[🔎] 49e840a9-3ca9-0b7b-56c5-66ca7638c17c@debian.org>
Reply-to: "Dr. Tobias Quathamer" <toddy@debian.org>, 1013946@bugs.debian.org
In-reply-to: <[🔎] 20220628003129.j3hc53qfd7jtxsyc@sym.noone.org>
References: <[🔎] b053c05b-d6ef-c563-3786-896e672e5938@tiscali.it> <[🔎] b053c05b-d6ef-c563-3786-896e672e5938@tiscali.it> <[🔎] 87v8sleqpl.fsf@hope.eyrie.org> <20220627231659.otmq7f3pft63nkvl@sym.noone.org> <[🔎] b053c05b-d6ef-c563-3786-896e672e5938@tiscali.it> <[🔎] 87mtdxel0c.fsf@hope.eyrie.org> <[🔎] 20220628003129.j3hc53qfd7jtxsyc@sym.noone.org> <[🔎] b053c05b-d6ef-c563-3786-896e672e5938@tiscali.it>

Am 28.06.22 um 02:31 schrieb Axel Beckert:

Still would be happy about input from Toddy on this. :-)


Hi all,

thanks a lot for your research and insights ...

I'm not an ISO expert, either, but from my reading and understanding therelationship between the standards (and the intended use) is as follows:

First, ISO 639 (without the suffix -1) was created, and it included mostof the major (spoken and written) languages of the world. All includedlanguages had a two letter code (like "en" for English).

As it quickly turned out, the two letter code was not enough tocategorize all written and spoken languages. :-)

Therefore, ISO 639 became ISO 639-1 and the development of ISO 639-2started. As far as I know, ISO 639-1 is a strict subset of ISO 639-2.

The standard ISO 639-2 introduced a three letter code, so they couldinclude many more languages. As the standard was intended to be used inbibliographic contexts, they created a code for every individuallanguage which has at least a "modest body of literature" (whatever"modest" means here.)

In order to accommodate languages with an even smaller proportion ofliterature, they created collections of languages, called "languagegroups" or "families". One such example are the Berber languages whichyou've discovered.

Lastly, ISO 639-3 and 639-5 have been created. Those standards aim tomake a clear distinction between individual languages (in ISO 639-3) andlanguage groups or families (ISO 639-5).

Apart from the language families (like Berber), every element thatrepresents an individual language in ISO 639-2 is included in ISO 639-3.So for individual languages, ISO 639-2 is a strict subset of ISO 639-3.

I'm not entirely sure if it's a good idea to use a language family as alocale (or in this context, program translation). It might work for theBerber example, if the Berber languages are really so similar that itdoesn't matter which language it is exactly. However, I don't knowanything about Berber languages, so I cannot tell if this approach makessense.

From a quick search, there are at least Kabyle language, Shilhalanguage, the Tuareg languages, Tarifit, and Central Atlas Tamazightwhich are summed up as Berber languages.

The safest way for lintian would probably be to use ISO 639-3 as asource for locale checking, because those codes represent an individuallanguage. The vast majority of program translations are into anindividual language, so the check seems plausible.

For bonus points, you could also check ISO 639-5 and print a warning (orinfo) that this locale code represents a language group rather than anindividual language. :-)

This is essentially Axel's latest suggestion -- except that I'd suggestto use ISO 639-3 instead of ISO 639-2 as authoritative source.

Sorry for this long e-mail, but languages and their codes are prettyhard ...


Regards,
Tobias

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Bug#1013946: lintian: wrongly report unknown-locale-code ber
  - From: Axel Beckert <abe@debian.org>

References:
- Bug#1013946: lintian: wrongly report unknown-locale-code ber
  - From: Fabio Fantoni <fantonifabio@tiscali.it>
- Bug#1013946: lintian: wrongly report unknown-locale-code ber
  - From: Russ Allbery <rra@debian.org>
- Bug#1013946: lintian: wrongly report unknown-locale-code ber
  - From: Russ Allbery <rra@debian.org>
- Bug#1013946: lintian: wrongly report unknown-locale-code ber
  - From: Axel Beckert <abe@debian.org>

Prev by Date: Bug#1007002: Lintian breaks existing lintian-overrides due to added []
Next by Date: Bug#1013946: lintian: wrongly report unknown-locale-code ber
Previous by thread: Bug#1013946: lintian: wrongly report unknown-locale-code ber
Next by thread: Bug#1013946: lintian: wrongly report unknown-locale-code ber
Index(es):
- Date
- Thread