Re: Bug#496266: UTF-8 string characters not properly recognized

To: debian-i18n@lists.debian.org
Subject: Re: Bug#496266: UTF-8 string characters not properly recognized
From: Matias D'Ambrosio <angasule@gmail.com>
Date: Wed, 3 Sep 2008 11:50:03 -0300
Message-id: <[🔎] 200809031150.03527.angasule@gmail.com>
In-reply-to: <[🔎] 1220393541.4690.55.camel@duncan>
References: <20080824005916.2848.97150.reportbug@mira.lan.galacticasoftware.com> <[🔎] 48BD839F.6010202@zombino.com> <[🔎] 1220393541.4690.55.camel@duncan>

On Tuesday 02 September 2008 19:12:21 Changwoo Ryu wrote:
> 2008-09-02 (화), 13:19 -0500, Adam Majer:
> > Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at
> > least my expected behaviour was being able to select 1 UTF-8 character
> > at a time, even if linguistically it does not make any sense.
>
> The Tibetan code in this case, U+0FA1 is NOT a character. It's a Tibetan
> code for combining with other Tibetan codes to form a Tibetan character.
> Unicode code points do not necessarily represent characters. Selecting
> combined character is more expected than selecting its sub-parts (even
> when it's possible).
>
> This issue is about handling Unicode combining. In this case, Pango
> interprets a quote mark (") and U+0FA1 Tibetan code (wrong combination)
> as one combined character. I'm not sure whether it's a defined behavior.
 I did a bit of searching, and the selection behaviour seen makes sense, I 
don't know if using Tibetan combining marks on non-Tibetan characters is 
allowed.
 Basically, one has to be careful about the definition of 'character' that is 
used.
 http://www.unicode.org/faq/char_combmark.html#2
 http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

Reply to:

References:
- Re: Bug#496266: UTF-8 string characters not properly recognized
  - From: Adam Majer <adamm@zombino.com>
- Re: Bug#496266: UTF-8 string characters not properly recognized
  - From: Changwoo Ryu <cwryu@debian.org>

Prev by Date: Re: Lenny installer string freeze status 20080903
Next by Date: Re: Lenny installer string freeze status 20080903
Previous by thread: Re: Bug#496266: UTF-8 string characters not properly recognized
Next by thread: Re: Bug#496266: UTF-8 string characters not properly recognized
Index(es):
- Date
- Thread