[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1025749: l10n stats pages: query clause to get ISO languages is too restrictive



Package: www.debian.org
X-Debbugs-CC: debian-l10n-devel@lists.debian.org,debian-i18n@lists.debian.org
Usertag: scripts 
Severity: wishlist

Hello,

To get the list of languages and countries, many of l10n pages (such 
as podebconf¹ templates and/or po files²), make use (among others) of 
«dgettext» utility (see related scripts here³⁴):

This is fine and a very convenient way to centralize resources, but the 
current clause used by these commands to display languages is a bit 
restrictive in the way it just queries for ISO 639-3 domain which is a 
standard which does not include so called «collective languages» (eg., 
«ber» for bereber languages).

Full explanation can be read here⁵ but in brief, what Wikipedia says is 
that ISO 639-3 is not a superset of ISO 639-2, so if we just query 
inside one, we are discarding others:

****
While ISO 639-2 includes three-letter identifiers for collective 
languages, these codes are excluded from ISO 639-3. Hence ISO 639-3 is 
not a superset of ISO 639-2. 
****

So, since this commit⁶ done 11 years ago, replacing ISO 639 for ISO 
639-3, now some languages are not properly showing their localized names
in PO stats pages, because the ISO 639-3 code table does not include 
them (this is by design), for instance, see the stats page for PO files²:

aym — Unknown language
bh — Unknown language
ber — Unknown language
bos_DE — Unknown language
bos_ES — Unknown language
bos_FI — Unknown language
bos_FR — Unknown language
bos_HU — Unknown language
bos_IT — Unknown language
bos_LT — Unknown language
bos_NL — Unknown language
bos_SV — Unknown language
bos_TR — Unknown language

(...)

There are 65 more languages displayed as «Unknown language» despite 
they are fully translated in their respective iso-codes packages, 
and this happens because of this query clause inside «dtc.def»:

****
if ($lang_fullname ne '') {
	$lang_fullname = dgettext("iso_639_3", "$lang_fullname");
        } else {
                return qq(<Unknown_Language>);
        }
****

So I wonder if we can do something to improve this clause to get a more 
inclusive query that can match a higher number of domain ISO codes, and 
not limiting the scope to just one standard that discards many others :-)

Some questions/thoughts regarding this:

Can «dgettext» return/query/output several or more that one ISO 
standard? That would be great and facilitate things here.

As for «dgettext», it looks to me that just returns one string, but 
maybe some magic can be done before getting «Unknown language» (ie., 
perform an additional query to ISO 639-2 before). 

Hope the problematic is clear and we can get a nice solution for this.

¹https://www.debian.org/international/l10n/po-debconf/index.en.html
²https://www.debian.org/international/l10n/po/index.en.html
³https://salsa.debian.org/webmaster-team/webwml/-/blob/master/english/international/l10n/dtc.defhttps://salsa.debian.org/webmaster-team/webwml/-/blob/master/english/international/l10n/scripts/fix-files.shhttps://en.wikipedia.org/wiki/ISO_639-3#Collective_languageshttps://salsa.debian.org/webmaster-team/webwml/-/commit/2bb96f31eabe559e1b68313efcbaaa7af3be19d4

Kind regards,

-- 
Camaleón 


Reply to: