[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1013946: lintian: wrongly report unknown-locale-code ber



Am 28.06.22 um 02:31 schrieb Axel Beckert:
Still would be happy about input from Toddy on this. :-)

Hi all,

thanks a lot for your research and insights ...

I'm not an ISO expert, either, but from my reading and understanding the relationship between the standards (and the intended use) is as follows:

First, ISO 639 (without the suffix -1) was created, and it included most of the major (spoken and written) languages of the world. All included languages had a two letter code (like "en" for English).

As it quickly turned out, the two letter code was not enough to categorize all written and spoken languages. :-)

Therefore, ISO 639 became ISO 639-1 and the development of ISO 639-2 started. As far as I know, ISO 639-1 is a strict subset of ISO 639-2.

The standard ISO 639-2 introduced a three letter code, so they could include many more languages. As the standard was intended to be used in bibliographic contexts, they created a code for every individual language which has at least a "modest body of literature" (whatever "modest" means here.)

In order to accommodate languages with an even smaller proportion of literature, they created collections of languages, called "language groups" or "families". One such example are the Berber languages which you've discovered.

Lastly, ISO 639-3 and 639-5 have been created. Those standards aim to make a clear distinction between individual languages (in ISO 639-3) and language groups or families (ISO 639-5).

Apart from the language families (like Berber), every element that represents an individual language in ISO 639-2 is included in ISO 639-3. So for individual languages, ISO 639-2 is a strict subset of ISO 639-3.



I'm not entirely sure if it's a good idea to use a language family as a locale (or in this context, program translation). It might work for the Berber example, if the Berber languages are really so similar that it doesn't matter which language it is exactly. However, I don't know anything about Berber languages, so I cannot tell if this approach makes sense.

From a quick search, there are at least Kabyle language, Shilha language, the Tuareg languages, Tarifit, and Central Atlas Tamazight which are summed up as Berber languages.



The safest way for lintian would probably be to use ISO 639-3 as a source for locale checking, because those codes represent an individual language. The vast majority of program translations are into an individual language, so the check seems plausible.

For bonus points, you could also check ISO 639-5 and print a warning (or info) that this locale code represents a language group rather than an individual language. :-)

This is essentially Axel's latest suggestion -- except that I'd suggest to use ISO 639-3 instead of ISO 639-2 as authoritative source.

Sorry for this long e-mail, but languages and their codes are pretty hard ...

Regards,
Tobias

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: