[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Precomposed Unicode layouts and permutations (was:Re: hello + UTF-8

On 07/08/2005, at 5:44 PM, Steve Langasek wrote:

Undoubtedly she should be using a precomposed layout, and i really
wonder if the charmap _is_ a precomposed layout, since the position
of diacritics varies in different apps, and that tends to be an
artifact of decomposed input, where the character is not input as one
whole character, but the vowel and accents are input separately, and
thus can become separated during display, and even (in my becoming-
famous case) have unanchored accents chase the cursor around the page!

FWIW, I'm pretty sure there is no such thing as a precomposed layout for devanagari script; the combinatorics (pairing each possible vowel sign with each possible consonant character, plus arbitrary numbers of combining forms
for consonant clusters) don't lend themselves to assigning a separate
Unicode codepoint for each combination, and indeed, I don't see any sign of
these combos in Unicode.

How many combinations are we talking about? With Vietnamese, the tones mean we have seventy-two vowels, which works for precomposed layouts for us. Without precomposed, until Level 2 Unicode is properly supported, we combined-diacritics languages have severe problems with consistent input and display across a range of software.

It's an important point, Steve, and hopefully we'll learn more from the Indic languages projects we've found. Evidently it's not a trivial implementation, as it wasn't for my language, but once you get there...

from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do)

Reply to: