[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: hello + UTF-8

Jutta, this is getting really interesting from the UTF-8 point of view. Thankyou for your findings. :)

On 07/08/2005, at 7:12 AM, Jutta Wrage wrote:

The following order is correct , but the final display on any document
has mistakes.

Maybe an new discovery; I made, helps here:

I visited http://www.linux-india.org/ with two different browsers.
The page is broken, but the importand thing can be seen:

The third line on the right has one of the vovel signs in question applied, too.
I looked the page with two OSX apps, but that should be no problem.

- - Omniweb shows the f-like character behind the first letter.
- - Safari shows it in front of the first letter.

So it must be more an application problem. If I paste it to a mail, both of them look equal:

लिनक्स-इन्डिया में आपका स्वागत है।

I see the f-like thingy at the beginning of the line here, when writing the mail. That is like Safari shows it and not like Omniweb shows it.
I think, the same might happen for different X applications.

The problem now is to find out, where the vovel sign should be placed. Then one can file bugs. ;-)

The really interesting thing here is that this eliminates the decomposed/precomposed Unicode bug in this case: both Safari and OmniWeb are Cocoa applications, and thus will display both decomposed and precomposed Unicode appropriately.

I need to test this more with svashka's languages, though, although they have the same combined-diacritic issues that mine does.

Undoubtedly she should be using a precomposed layout, and i really wonder if the charmap _is_ a precomposed layout, since the position of diacritics varies in different apps, and that tends to be an artifact of decomposed input, where the character is not input as one whole character, but the vowel and accents are input separately, and thus can become separated during display, and even (in my becoming- famous case) have unanchored accents chase the cursor around the page!

Danilo Šegan of Gnome-i18n has come through with some excellent info:

http://indlinux.org/ sounds like a good starting point.

This is wonderful: it looks like they have an entire Linux distro for Indic languages.

There is also linux-utf8 list, and xkeyboard-config list for
development of keyboard layout maps.

However, if input of Sanskrit and Marathi required "input method"
support, you might want to look into different input method mechanisms
(XIM, Gtk+ IM,...).

Thankyou for any help you can offer with this. Is there a list where
one should discuss Unicode input and display?

Not a centralised one, no, but linux-utf8 sounds like a good starting


and this list looks like an excellent resource for these sorts of problems.

We'll be collecting the info sorted out through the investigation, and post a summary here as well as on the appropriate i18n lists, since this was originally a D-W translator enquiry. :)

Please continue to contribute your experience in this area: all Unicode Level 1->2 and combined diacritics experience, in particular, is very welcome.

Jutta, are you a member of the omniweb-l list? If not, I'll post there as well, and ask about the varying display. It's a particularly useful clue... and the Omniweb people are always very responsive and skilled in these issues.

from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do)

Reply to: