Re: Working on a locale to allow the use of ijekavian variant?

To: debian-l10n-serbian@lists.debian.org
Subject: Re: Working on a locale to allow the use of ijekavian variant?
From: Christian PERRIER <bubulle@debian.org>
Date: Sun, 14 Aug 2011 12:57:17 +0200
Message-id: <[🔎] 20110814105717.GM21371@mykerinos.kheops.frmug.org>
In-reply-to: <[🔎] CAAD3jYqZQPSWc9NsA6doqQvZHxV4VWguboeuTMiQMxamCg=UZQ@mail.gmail.com>
References: <[🔎] 20110813065742.GY21371@mykerinos.kheops.frmug.org> <[🔎] CAAD3jYqZQPSWc9NsA6doqQvZHxV4VWguboeuTMiQMxamCg=UZQ@mail.gmail.com>

(no need to CC me to answers as I read the list)

Quoting Bojana Borkovic (bojana.borkovic@ulk.rs.ba):

> Now, if we name sr_BA@ijekavian, for me that don't make sense, because it

We don't name the language, we name the *locale* this way. 

Let's go back to the basics...and sorry in advance if that appears
pedantic..:-)

A locale is always a combination of a language and a country as a
locale file contains information about the language (collation order,
name of days, months, etc.) and information about the country
(currency, postal information, etc.).

For translation files to be used in any Unix system, the locale
variable LC_MESSAGES has to be defined to a valid locale (understand,
a locale that exists in glibc....on Debian systems, locale files are
in /usr/share/i18n/locales). Most often, people indeed define LC_ALL
to match an existing locale. The LANG variable often comes in
play. You get your settings by typing "locale" at the command-line
prompt:

cperrier@mykerinos:~/src/debian/iso-codes/git/iso_3166$ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

(sound like I am French..:-))

Then, when a (gettext-enabled) program is used, the gettext library
first searches for a <program>.mo file named in
/usr/share/locale/<locale>.

If none is found, then the country part is removed from the locale
name and <program>.mo is searched in
/usr/share/locale/<locale_without_country>

(these *.mo files are the compiled form of the *.po text files)

Most of the time, translation files are named <language>.mo. This
allows translations to be country-neutral : if a French translation is
named fr_FR.mo, then it will be used only for users in France, not
those in other French-speaking countries. If it is named fr.po, it is
available for any French speaking user.


Currently, most existing Serbian translation files are named "sr" and
are ekavian. So, in short, using "sr_FOO" as locale will lead people
to use ekavian translations.

Of course, we could name the ijekavian files as sr_BA, but that would
indeed have two flaws:
- first, it was apparently unacceptable when we discussed that during
DebConf. I drew the conclusion that this was related to the quite
specific status of Republika Srpska ("country in a country")...but I may
be wrong
- ijekavian variant would only be available for people choosing BA as
country and thus not available to people wanting to use the ijakavian
variant...but in another country.


In general, using the country as modifier to design a language variant
is not the best idea. There are precedents to this (pt_BR,
zh_CN|zh_TW) but, if they can be avoided, it's better.

So, the suggestion of using a modifier (@ijekavian) is indeed the best
(thanks to Zlatan for bringing it first....indeed, this is already
what's used in some KDE programs). Actually, modifiers are made
exactly for this..:-)

> would mean that we have just sr_BA version which is incorrect. As I've
> mentioned, Republika Srpska is the legit entity of BiH, in which Serbian
> language is spoken, and official letter is cyrillic. On the other hand, in
> Federation BiH, the official language is Bosnian, with latin as official
> letter.
> 
> So, I think, one way would be to name   "ijekavian" variant of Serbian
> language with cyrillic  just sr_BA, since it's clear where it's spoken. And
> the latin variant should be the sr_BA@latin, which is just the latin
> variant.

Unfortunately, with the above mechanisms, that would lead to "sr"
translations (thus ekavian ones) to be the fallback when no "sr_BA"
translation exists. Thus, a mix of ijekavian and ekavian...

And, also, that would restrict ijekavian to people choosing "Bosnia
and Herzegovina" as country.


> 
> The other way would be to just leave:
> 
> sr@ijekavian: Ijekavian variant, Cyrillic
> sr@ijekavianlatin: Ijekavian variant, Latin
> 
> Because the ijekavian variant of Serbian language is spoken only in
> Republika Srpska. And it is a *Serbian* language.

From source I see, the variant sponken in Montenegro is closer from
ijekavian than ekavian. Montenegro people are currently trying to have
it name "Montenegrin" because of its specificities (extra letters,
mostly). So, well, we can probably ignore this...

> I guess we don't have to include the country code? Or we do? In that case,
> the most correct thing would be sr_BA, 'cause
> sr_RS-BA is not looking nice. :)

We have to include the country code in the *locale*, but not in the
name of translation files.


> 
> Maybe I confused everyone with everything :), but that's just mine point of
> view.

It seems we all have the same....but all these thigns are sometimes so
confusing that it may be hard to understand each other..:)

In short, we have two things to name:

- the translation files: have to include a language code, may include
a country code (but should be avoided), may include a modifier
- the locale(s): have to include a language code, have to include a
country code, may include a modifier

Final proposal:

Translation files:
Serbian, ijekavian variant, cyrillic: sr@ijekavian
Serbian, ijekavian variant, latin: sr@ijekavianlatin
Serbian, ekavian variant, cyrillic: sr
Serbian, ekavian variant, latin: sr@latin
  (the latter two are already used)

Locales:
Serbian, ekavian variant, cyrillic, Serbia: sr_RS  (already exists)
Serbian, ekavian variant, latin, Serbia: sr_RS@latin  (already exists)
Serbian, ijekavian variant, cyrillic, Bosnia and Herzegovina: sr_BA@ijakevian  (to be written)
Serbian, ijekavian variant, latin, Bosnia and Herzegovina: sr_BA@ijakevianlatin  (to be written)
Bosnian, latin, Bosnia and Herzegovina: bs_BA (already exists)

(I used "Bosnia and Herzegovina" above to make reference to the whole
country, ie the two entities, not the the entity named "Federation of
Bosnia and Herzegovina")

I leave aside the sr_ME locale because I think it might be soon be
obsoleted as soon as Montenegrin is defined separately in ISO-639
(which might happen, more for political reasons than linguistics reasons).

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: Working on a locale to allow the use of ijekavian variant?
  - From: Bojana Borkovic <bojana.borkovic@ulk.rs.ba>
- Re: Working on a locale to allow the use of ijekavian variant?
  - From: Janos Guljas <janos@resenje.org>

References:
- Working on a locale to allow the use of ijekavian variant?
  - From: Christian PERRIER <bubulle@debian.org>
- Re: Working on a locale to allow the use of ijekavian variant?
  - From: Bojana Borkovic <bojana.borkovic@ulk.rs.ba>

Prev by Date: Re: Working on a locale to allow the use of ijekavian variant?
Next by Date: Re: Working on a locale to allow the use of ijekavian variant?
Previous by thread: Re: Working on a locale to allow the use of ijekavian variant?
Next by thread: Re: Working on a locale to allow the use of ijekavian variant?
Index(es):
- Date
- Thread