[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#496266: UTF-8 string characters not properly recognized



Christian Perrier wrote:
Le samedi 23 août 2008 à 19:59 -0500, Adam Majer a écrit :
Package: gedit
Version: 2.22.3-1
Severity: normal

The following UTF-8 string is not correctly handled in gedit,

const char *unicode_insert = "?Э";

The " and the ? characters are viewed as one character, making the
entire thing next to impossible to copy/paste/edit.
Looks like an issue in pango, since it is not specific to gedit.

Such things seem to happen a lot when using Tibetan characters, so this
may or may not be intentional. I’d prefer to have the input of someone
who uses them. Is there anyone on debian-i18n who’s more knowledgeable
about Tibetan glyphs?


Adding Pema Geyleg and Tenzin Dendup, our fellow Dzongkha translation
coordinators, who certainly have skills about Tibetan-family scripts
(Dzongkha is one of these) and could maybe point you to people with
needed knowledge.


I'm sorry, but aren't we missing the entire point here? This is not
about bad handling of some Tibetan characters. It is about bad handling
of 3-byte UTF-8 characters.

http://en.wikipedia.org/wiki/UTF-8

So, the following characters should have the same problems,

"ऄक

"ঈউঊ

"ਜਗਏ

"ଜଁଂ

"ஔ

"ంఁః

"ಂಖ

"ഈഃ

etc..


I've put a Ascii " in front of all the different characters. In emacs, I'm able to select the " in front of these characters and copy it. vim under a UTF-8 gnome terminal also allows the " to be selected. The 2nd last line above (using icedove), I can't independently select the " but I can select the " and ಂ together and then remove the 2nd character.

Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at least my expected behaviour was being able to select 1 UTF-8 character at a time, even if linguistically it does not make any sense.

- Adam


Reply to: