Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8

To: GOTO Masanori <gotom@sanori.org>
Cc: 471021@bugs.debian.org
Subject: Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
From: d+deb@vdr.jp
Date: Fri, 9 Jan 2009 14:26:05 +0900
Message-id: <[🔎] 20090109052605.GA18356@lilith.infoblue.home>
Reply-to: d+deb@vdr.jp, 471021@bugs.debian.org
In-reply-to: <[🔎] 814p09ag97.wl%gotom@sanori.org>
References: <[🔎] 814p09ag97.wl%gotom@sanori.org>

On Fri, Jan 09, 2009 at 01:56:20AM +0900, GOTO Masanori wrote:
> I don't agree with the concept of "UTF-8-CJK" because it's over
> exaggerated.  Is it a locale dependent issue, or character encoding
> issue?

I treat ``UTF-8-CJK'' locale as just workaround.
Nothing could be better than using only UTF-8 locale.

> According to UAX#11, your point doesn't make sense because your
> reference just mention about character mapping.  Instead, "When
> processing or displaying data" section says,
> 
> "Ambiguous characters behave like wide or narrow characters depending
> on the context (language tag, script identification, associated font,
> source of data, or explicit markup; all can provide the context). If
> the context cannot be established reliably, they should be treated as
> narrow characters by default."

I see.

> If the all legacy applications use wcwidth() supposing the width of
> ambiguous font size = 2, it's OK to introduce your idea - but I'm not
> sure it's true or not.
> 
> Font rendering application should basically consider the font size.
> Why doesn't rxvt consider about such font rendering size?  Or should
> we introduce special environment variable or locale tag to decide the
> behavior of wcwidth value for ambiguous characters?

For legacy applications' concern, it is good that selectable.
If legacy applications are negligible,
settled setting is better for users' convenience.
But libc and locales are so fundamental,
it is possible that its rigidity is a cause of concern...

Regards,
	dai
--

Reply to:

Follow-Ups:
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: "Masanori Goto" <gotom@sanori.org>

References:
- Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
  - From: GOTO Masanori <gotom@sanori.org>

Prev by Date: [Bug localedata/9704] Malta switched to Euro (One year ago)
Next by Date: Processed: ineptitude
Previous by thread: Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
Next by thread: Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8
Index(es):
- Date
- Thread