Re: Bug#496266: UTF-8 string characters not properly recognized

To: Christian Perrier <bubulle@debian.org>
Cc: 496266@bugs.debian.org, debian-i18n <debian-i18n@lists.debian.org>, pema.geyleg@gmail.com, tenzin.dendup@gmail.com
Subject: Re: Bug#496266: UTF-8 string characters not properly recognized
From: Adam Majer <adamm@zombino.com>
Date: Tue, 02 Sep 2008 13:19:11 -0500
Message-id: <[🔎] 48BD839F.6010202@zombino.com>
In-reply-to: <[🔎] 20080902160656.GS3868@mykerinos.kheops.frmug.org>
References: <20080824005916.2848.97150.reportbug@mira.lan.galacticasoftware.com> <[🔎] 1220352330.4197.22.camel@shizuru> <[🔎] 20080902160656.GS3868@mykerinos.kheops.frmug.org>

Christian Perrier wrote:

Le samedi 23 août 2008 à 19:59 -0500, Adam Majer a écrit :

Package: gedit
Version: 2.22.3-1
Severity: normal

The following UTF-8 string is not correctly handled in gedit,

const char *unicode_insert = "?Э";

The " and the ? characters are viewed as one character, making the
entire thing next to impossible to copy/paste/edit.

Looks like an issue in pango, since it is not specific to gedit.

Such things seem to happen a lot when using Tibetan characters, so this
may or may not be intentional. I’d prefer to have the input of someone
who uses them. Is there anyone on debian-i18n who’s more knowledgeable
about Tibetan glyphs?



Adding Pema Geyleg and Tenzin Dendup, our fellow Dzongkha translation
coordinators, who certainly have skills about Tibetan-family scripts
(Dzongkha is one of these) and could maybe point you to people with
needed knowledge.



I'm sorry, but aren't we missing the entire point here? This is not
about bad handling of some Tibetan characters. It is about bad handling
of 3-byte UTF-8 characters.

http://en.wikipedia.org/wiki/UTF-8

So, the following characters should have the same problems,

"ऄक

"ঈউঊ

"ਜਗਏ

"ଜଁଂ

"ஔ

"ంఁః

"ಂಖ

"ഈഃ

etc..

I've put a Ascii " in front of all the different characters. In emacs,I'm able to select the " in front of these characters and copy it. vimunder a UTF-8 gnome terminal also allows the " to be selected. The 2ndlast line above (using icedove), I can't independently select the " butI can select the " and ಂ together and then remove the 2nd character.

Maybe it is just my misunderstanding of UTF-8, I'm not sure. But atleast my expected behaviour was being able to select 1 UTF-8 characterat a time, even if linguistically it does not make any sense.


- Adam

Reply to:

Follow-Ups:
- Re: Bug#496266: UTF-8 string characters not properly recognized
  - From: Changwoo Ryu <cwryu@debian.org>

References:
- Re: Bug#496266: UTF-8 string characters not properly recognized
  - From: Josselin Mouette <joss@debian.org>
- Re: Bug#496266: UTF-8 string characters not properly recognized
  - From: Christian Perrier <bubulle@debian.org>

Prev by Date: Re: Bug#496266: UTF-8 string characters not properly recognized
Next by Date: Re: Bug#496266: UTF-8 string characters not properly recognized
Previous by thread: Re: Bug#496266: UTF-8 string characters not properly recognized
Next by thread: Re: Bug#496266: UTF-8 string characters not properly recognized
Index(es):
- Date
- Thread