[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: possible ICU transition

Steinar H. Gunderson wrote:

On Fri, Aug 05, 2005 at 09:04:34PM +0200, Andreas Fester wrote:
being involved in an i18n project currently, I also learned about
ICU recently. I was a littlebit disappointed to find only a very old
version in the Debian archive, so its very good to see that the package
is still maintained and that you are working on the current 3.4 version

While we're at all the i18n (is there a more proper list for this, BTW?)
business: Is there a good way for a given locale of getting ordinal numbers?
Ie. func(1) = “1st”, func(2) = “2nd”, func(3) = “3rd” etc. for LANG=en_US,
func(1) = “1ère”, func(2) = “2ème”, func(3) = “3ème” etc. for LANG=fr_FR and
so on...

/* Steinar */
Actually, ICU can do that. Search for the word "ordinal" at http://icu.sourceforge.net/apiref/icu4c/classRuleBasedNumberFormat.html for a lead.

I would recommend against, however, for two reasons:
1. ICU's link requirements are horrible. If you link with a particular version of ICU, the version number gets mangled into the function name, so that replacing the shared object at a later point is just a big no-no. In addition to that, if your project is not C++, you are creating a dependency on the C++ runtime library, which is often a problem. I introduced ICU dependency into Wine to put in BiDi support, and the code is rotting away there, because it is basically unmaintainable.

2. The actual concept is flawed. ICU is trying to give the impression that you can just type in a number and get it translated into textual representation without knowing in advance anything about the language (http://www-950.ibm.com/software/globalization/icu/demo/locales). The problem is that the people doing the designing are usually western people. This means that it works great for Latin based languages, but the further away you travel, the less well it works.

Example: In English: 1st (or "first"). In French, 1ère. In Hebrew? Well, it depends. If you are counting male object (and in Hebrew, nouns have a gender), that would be "ראשון", otherwise it would be "ראשונה". The description is dependent on the noun.

About a year ago, ICU simply did not have an answer to that problem. I just looked again to find out the answer for your original question, and they do appear to have a muscular vs. feminine forms. Give them 10/10 for effort. I fail to see how that can solve the problem, though. Even if you were somehow made aware that some languages have this distinction (a great advancement already), you have no clue what the object's gender should be. Please bear in mind that Hebrew is not the only language to have gendered nouns. Also bear in mind that while "Table" is male in Hebrew, it may well be female in another language.

In short, please think carefully before using an automatic solution for generating your numbers. Things may not be as simple as you think.


Shachar Shemesh
Lingnu Open Source Consulting ltd.

Reply to: