Re: Adding New Script Variants on Debian Installer

On Sun, Mar 18, 2012 at 6:03 AM, Christian PERRIER <bubulle@debian.org> wrote:

Quoting Eagle Burkut (eagle.burkut@gmail.com):

> While we have finished translating more than 95% translation string in
> Ubuntu (that is 350,000 sentences!), due to lots of bugs and inabilities in
> various libraries/packages/fonts, we got a localized Ubuntu with lots of
> bugs/errors. In summary, the original Ubuntu has problems, bugs in every
> aspect of

Hmmm, all what I like in Ubuntu l10n infrastructure: even upstream
software is translated through Rosetta/Launchpad and nobody has any
idea about the translations ending up in upstream software. In short:
they don't benefit anything but Ubuntu users. VERY sad.

Why wouldn't Fedora users be able to use Libreoffice, Firefox, Gimp,
etc, in Uyghur?

LibreOffice and Firefox are translated in upstream, it is not part of the Ubuntu translation process.

Anyway...let's just hope that Ubuntu/Canonical jave a magic system for
making this happening...but I would be surprised.

What makes Ubuntu different from other Linux distributions is that, in my view, Ubuntu/Canonical is working harder to actively support the development, localization, and translation workflow. I have not seen the same level of support and effort from other distributions.

>
> (1) Right-to-Left support
> (2) Bi-Directional support
> (3) Font selection
> (4) Proper fonts
>
> Thus, even with hundreds of translator spending couple of years of hard
> work, we still ended up not so good localized Ubuntu.
>
> If we utilize Latin based Uyghur, all of the above problems will disappear
> immediately. That is why we are thinking about adding Latin based Uyghur to
> Ubuntu, including the Debian Installer.

Then why not work on these problems rather than circumventing with
hacks?

We tried very hard on this problems. There are many parties involved in these problems and it is not easy to identify the source, and worst of all, almost all sides deny that those bugs really come from their work.

First of all, there is no proper font to support Uyghur in open source community. Unfortunately, due to the file size limit of Ubuntu distribution, we can not have our own fonts included in the distribution ISO/CD. And the mixture of the Ubuntu selected fonts, FontConfig, Unity, Gnome, almost all parties said that the bugs are not related to themselves, they are from other sources...

With limited manpower and technical expertise, we can not check all of the source codes line by line to identify the sources of the bugs. It is really frustrating.

> As Uyghur is written in modified Arabic-Persian in China, in Cyrillic in
> Kazakhstan and central Asia, and in Latin elsewhere globally, we previously
> had ug_CN locale in Arabic, and we have just developed ug_US locale in
> Latin and submitted it to upstream glibc library, and more, we are planning
> to develop ug_KZ locale in Cyrillic in near future.

I'm afraid, this is an incorrect use of locales.

Country modifiers shouldn't be used for language variants. So a ug_US
locale means nothing (Uyghur in United States of America? Why not
ug_UK or ug_PT or whatever?).

Latin based Uyghur is not used inside China and Kazakhstan at all. It is used mainly outside of China, especially in North America and other western countries. Among them, United States is the country where the Latin based Uyghur is used heavily among its Uyghur residents.

The correct locale for "Uyghur written in latin script" should be
"ug_CN@latin" (and eventually ug_XX@latin, in case Uyghur is widely
used in another country than "China", such as Kazakhstan, for
instance.

And, in case Uyghur is also written in Cyrillic, as you seem to imply,
then the right locale would be ug_XX@cyrillic, where XX is the country
where the cyrillic variant is the most widely used .....I guess KZ,
then

I would strongly advise against using ug_KZ to denote "Uyghur written
in cyrillic ". It should be kept for "Uyghur written in Arabic script,
in Kazakhstan" (and if that means nothing as Uyghur is never written
in arabic script in KZ, then don't create the locale but create
ug_KZ@cyrillic).

But, really, ug_US should not happen and I doubt upstream glibc
maintainers accept it.

Before we came to that naming convention, we did a quite bit of research. There are couple of obstacles to adopt the other idea. First of all, the FontConfig library does not support with language/script variant with @ such as ug_CN, and ug_CN@latin. There is no way to develop a orthography rule file for ug_CN@latin and have it included in FontConfig. Secondly, the online translation platform Launchpad does not support another language/script variant of the same language in same country. These are originally design issues and in the near foreseeable future, there is no clear indication that there will be a fix. And lastly, while doing a research on this issue, the Kurdish language in Turkey, Iraq and Iran draw our attention. There are just language and country code, no @ variants. Our case is exactly same as of Kurdish, nothing more or nothing less. The @ variant has to be used for the same language in same country, such as uz_UZ@latin and uz_UZ@cyrillic. For same the language in different countries, there is no need to add the @ variant, such as in the case of Kurdish. And in our case, for the same language, different writing systems are used in different countries, not in the same one country, thus we think the @ variant is not necessary.