[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Working on a locale to allow the use of ijekavian variant?



After our discussions at DC11, we concluded that the best way to
respect the need to have the two different variants of Serbian
language represented would be to use a modifier.

So, the conclusion was:

sr: Ekavian variant, written in Cyrillic
sr@latin:  ditto in Latin

sr@ijekavian: Ijekavian variant, Cyrillic
sr@ijekavianlatin: Ijekavian variant, Latin

(I hope I'm now writing "ijekavian" the right way...if I don't, please
accept apologies and correct me, in the hope that it doesn't happen
anymore..:-))

So, that solves a great part of the problem. However, for these
variants to be used, we need locale files to exist so that people can
define them in their environment (this is indeed done by D-I when the
language is chosen-->the appropriate locale is defined in users
environment, from the combination of chosen language and country).

As of now, the glibc has three "Serbian" locales:

cperrier@mykerinos:~$ ls -l /usr/share/i18n/locales/sr*
-rw-r--r-- 1 root root 4940 2011-08-09 01:03 /usr/share/i18n/locales/sr_ME
-rw-r--r-- 1 root root 9856 2011-08-09 01:03 /usr/share/i18n/locales/sr_RS
-rw-r--r-- 1 root root 5465 2011-08-09 01:03 /usr/share/i18n/locales/sr_RS@latin

Indeed, from what I see, the sr_ME locale seemsto be ijekavian:

A diff between both files (converted from U+xxxx notation to UTF-8
with the attached script), gives things like:

 LC_TIME
-abday   "нед";"пон";"уто";"сри";"чет";"пет";"суб"
-day     "недјеља";"понедељак";"уторак";"сриједа";"четвртак";"петак";"субота"
+
+abday   "нед";"пон";"уто";"сре";"чет";"пет";"суб"
+day     "недеља";"понедељак";"уторак";"среда";"четвртак";"петак";"субота"

...which, from my very basic understanding of the language is a good
definition about differences between ekavian and ijekavian.

So, it seems that a good basis for a locale using sr@ijekavian as
language would be sr_ME.

(by the way, it seems that using ijekavian in sr_ME is not a very good
idea...this is indeed the same "trick" I was originally proposing with
"sr_BA" being an ijekavian locale)

If we go this way, now the "only" thing to do is choosing the
"country" part (as, of course, the country-related things like postal
codes, currency, etc. can't be copied from those of Montenegro).

Of course, this might not be as easy as just saying it....as the only
choice we can do is indeed sr_BA@ijakevian (and sr_BA@ijekavianlatin).

Writing the locale is very easy: it requires basic knowledge about
language+country and we can do it easily in a few days here in the list.


But, of course, we first need to be sure about the locale name.

Comments?

-- 



#! /usr/bin/perl

use encoding 'utf8';

sub c {
	my $text = shift;
	my $ret = '';
	my $lastpos = 0;
	while ($text =~ m/\G(.*?)<U(....)>/g) {
		$lastpos = pos($text);
		$ret .= $1;
		my $n = hex($2);
		if ($n < 0x80) {
			$ret .= pack("U", $n);
		} elsif ($n < 0xc0) {
			$ret .= pack("UU", 0xc2, $n);
		} elsif ($n < 0x100) {
			$ret .= pack("UU", 0xc3, $n & 0xbf);
		} else {
			$ret .= pack("U", $n);
		}
	}
	return $ret.substr($text, $lastpos);
}

my $last = '';
while (<>) {
	if ($last ne '') {
		$_ = $last . $_;
		$last = '';
	}
	if (m/\/\s*$/s) {
		s/\/\s*$//s;
		$last = $_;
		next;
	}
	s/"([^"]*)"/'"'.c($1).'"'/eg;
	s/";\s*"/";"/g;
	print;
}

Attachment: signature.asc
Description: Digital signature


Reply to: