[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#239092: ITP: mozilla-thunderbird-locale-uk -- Mozilla Thunderbird Ukrainian Language/Region Package



ma, 2004-03-22 kello 15:48, Andreas Barth kirjoitti:
> Sorry, but the claim about ISO is wrong. According to ISO 3166
> http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1-semic.txt
> Ukraine is UA, and United Kingdom is GB (and as far as I know the
> _only_ country-TLD not the same as the ISO-name).

I did some googling and hopefully my understanding is now correct and
this mail clarifies things instead of muddling things up.

ISO 3166[1] lists two letter codes for countries. In that list, "GB"
means "United Kingdom" and "UA" means "Ukraine" and "UK" means nothing
(though "UK" is commonly used to refer to "United Kingdom).

ISO 639-1[2] lists two letter codes for languages. Languages and
countries do not map 1:1 so it's important to know whether the acronym
refers to country or language. In that list "UK" means "Ukrainian" (the
language) and "EN" means "English", "CY" means "Welsh", "CD" means
"Scots gaelic", and "GA" means "Irish".

ISO 639-2[3] lists three letter codes for languages. In that list,
Ukrainian is "UKR" (I'm too tired to dig up the other languages). Note
that two letter codes are not sufficient to cover all existing
languages, since there are only 26^2 = 676 combinations if you want to
stay within the 26 letter English alphabet. There are several thousand
languages still alive, I think.

RFC 3066[4] also specifies language codes. It specifies that two letter
codes are interpreted according to ISO 639-1, and three letter codes are
interpreted as ISO 639-2. (It has some additional stuff as well.)

IANA maintains the list of top level domains for countries.[5] These
supposedly follow ISO 3166, at least for new top level domains. In that
list, ".uk" is "United Kingdom" (and not ".gb" as ISO 3166 would
suggest, this seems indeed to be the only difference) and ".ua" is
"Ukraine".

Since this package names identifies language, not countries, ISO 639-1
seems like a reasonable thing to use, though I have the feeling that
moving to three letter codes might be a good idea in the future, in this
case and in, say, locale names.

Someone who really understands these things could probably point at a
good tutorial on country and language codes that goes into more detail
than quick googling allowed me to learn.

[1]
http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1-semic.txt

[2] http://sunsite.berkeley.edu/amher/iso_639.html

[3] http://lcweb.loc.gov/standards/iso639-2/langhome.html

[4] ftp://ftp.rfc-editor.org/in-notes/rfc3066.txt

[5] http://www.iana.org/cctld/cctld-whois.htm

-- 
http://liw.iki.fi/liw/log/



Reply to: