[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Advice on bootstrapping l10n for gayo language

Quoting Jonas Smedegaard (dr@jones.dk):

> It seems gayo has an official language code [gay], but that locale is 
> not represented in /usr/share/i18n/SUPPORTED or other places I randomly 
> looked.  So I suspect that I have a case which does not need definition 
> but does need recognition fundamentally in the system.
> What is your experiences in bootstrapping "new" languages?  Do you 
> perhaps even have - or know about - some nice HOWTO on that?

I already wrote a few locales from scratch. That's fairly easy.

If you look at files in /usr/share/i18n/locales, you'll see that these
are text files.

What's in the files are things like days of the week's names(and short
names, currency name and symbol, postal code, etc. So, in short,
things related to either the language or the country.

That's why a locale is always a language+country combination.

Here, what's needed is writing gay_ID. 

I usually take another locale as a basis and I modify it. If there's a
locale for the same language but another country, that's the best
start. Otherwise, you can take a locale for another language in the
same country.Or.....you start from scratch..:-)

Locales are  text files but the only trick is that characters in
locales files are represented by their Unicode code. In short, a
locale is an ASCII file.

To make it easier, Denis Barbier wrotetwo scripts (attached) that can
convert a UTF-8 encoded file to this "uxx" format back and forth.

This way, you start from an existing locale, convert it to UTF-8 with
utf2uxx, edit it, then convert it back with utf2uxx.

I would recommend starting from /usr/share/i18n/locales/id_ID as this
is the locale with right information for Indonesia.

> ...or would you perhaps recommend against attempting such a challenge 
> for a small community of 200k-500k persons?

Certainly not. We already have efforts for such "small" communities
(think about Northern Sami)

#! /usr/bin/perl

use encoding 'utf8';

sub c {
	my $text = shift;
	my $ret = '';
	my $lastpos = 0;
	while ($text =~ m/\G(.*?)<U(....)>/g) {
		$lastpos = pos($text);
		$ret .= $1;
		my $n = hex($2);
		if ($n < 0x80) {
			$ret .= pack("U", $n);
		} elsif ($n < 0xc0) {
			$ret .= pack("UU", 0xc2, $n);
		} elsif ($n < 0x100) {
			$ret .= pack("UU", 0xc3, $n & 0xbf);
		} else {
			$ret .= pack("U", $n);
	return $ret.substr($text, $lastpos);

my $last = '';
while (<>) {
	if ($last ne '') {
		$_ = $last . $_;
		$last = '';
	if (m/\/\s*$/s) {
		$last = $_;

#! /usr/bin/perl -C1

sub c {
	my $text = shift;
	my $convert_ascii = shift;
	my $ret = '';
	while ($text =~ s/(.)//) {
		$l = unpack("U", $1);
		if ($convert_ascii == 0 && $l < 0x80) {
			$ret .= $1;
		} else {
			$ret .= sprintf "<U%04X>", $l;
	return $ret;

my $convert_ascii = 1;
while (<>) {
		$convert_ascii = 0;
		$convert_ascii = 1;
	my $conv = $convert_ascii;
	$conv = 0 if (/^(copy|include)/);
	s/"([^"]*)"/'"'.c($1, $conv).'"'/eg;

Attachment: signature.asc
Description: Digital signature

Reply to: