Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8

To: d+deb@vdr.jp
Cc: 471021@bugs.debian.org
Subject: Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
From: "Masanori Goto" <gotom@sanori.org>
Date: Sun, 11 Jan 2009 23:53:21 +0900
Message-id: <[🔎] de88ba480901110653j6790fbedyb38633d651fd4d06@mail.gmail.com>
Reply-to: "Masanori Goto" <gotom@sanori.org>, 471021@bugs.debian.org
In-reply-to: <[🔎] 20090111102012.GA13896@lilith.infoblue.home>
References: <[🔎] 814p09ag97.wl%gotom@sanori.org> <[🔎] 20090109052605.GA18356@lilith.infoblue.home> <[🔎] de88ba480901101817m713d0da9r7f081e4a2690330@mail.gmail.com> <[🔎] 20090111102012.GA13896@lilith.infoblue.home>

It'd be great that you propose the good way to do so alternatively.

2009/1/11  <d+deb@vdr.jp>:
> On Sun, Jan 11, 2009 at 11:17:48AM +0900, Masanori Goto wrote:
>> wcwidth() is legacy function so that it cannot handle wide, RTL and
>> combined characters correctly.  An environment value to select its
>> behavior is one way, but it's just a hack and it's hard to specify in libc.
>>
>> So, according to UAX#11 definition, it says we should return
>> 1 for EastAsiasnAmbiguous characters unless a rigid signal
>> (like "language tag, script identification, associated font, source of data")
>> is available in UTF-8.  It's sure that we can introduce such kind of change
>> for SJIS/EUC-JP, but it's hard to decide for ja_JP.UTF-8.
>>
>> Overall, we have no way to expand wcwidth() correctly and rightly,
>> so I think each application should handle the actual font size of characters
>> instead of using wcwidth().
>
> Thank you for your explanation.
>
> I understand that unable to expand wcwidth()
> and each application should be modified.
>
> But each application implements each approach now
> For example, own one, various version of Markus Kuhn's wcwidth.
> In my layman's idea, could libc offer common method for it?
>
> Regards,
>        dai
> --
>
>

Reply to:

Follow-Ups:
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: d+deb@vdr.jp

References:
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: GOTO Masanori <gotom@sanori.org>
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: d+deb@vdr.jp
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: "Masanori Goto" <gotom@sanori.org>
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: d+deb@vdr.jp

Prev by Date: Bug#511474: locales: en_GB locale has wrong week specifier
Next by Date: Bug#511357: Be careful with #511357
Previous by thread: Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
Next by thread: Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
Index(es):
- Date
- Thread