Re: Bug#496266: UTF-8 string characters not properly recognized
On Tuesday 02 September 2008 19:12:21 Changwoo Ryu wrote:
> 2008-09-02 (화), 13:19 -0500, Adam Majer:
> > Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at
> > least my expected behaviour was being able to select 1 UTF-8 character
> > at a time, even if linguistically it does not make any sense.
> The Tibetan code in this case, U+0FA1 is NOT a character. It's a Tibetan
> code for combining with other Tibetan codes to form a Tibetan character.
> Unicode code points do not necessarily represent characters. Selecting
> combined character is more expected than selecting its sub-parts (even
> when it's possible).
> This issue is about handling Unicode combining. In this case, Pango
> interprets a quote mark (") and U+0FA1 Tibetan code (wrong combination)
> as one combined character. I'm not sure whether it's a defined behavior.
I did a bit of searching, and the selection behaviour seen makes sense, I
don't know if using Tibetan combining marks on non-Tibetan characters is
Basically, one has to be careful about the definition of 'character' that is