[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#471021: locales: EastAsianAmbiguous character width is always 1 in UTF-8



wcwidth() is legacy function so that it cannot handle wide, RTL and
combined characters correctly.  An environment value to select its
behavior is one way, but it's just a hack and it's hard to specify in libc.

So, according to UAX#11 definition, it says we should return
1 for EastAsiasnAmbiguous characters unless a rigid signal
(like "language tag, script identification, associated font, source of data")
is available in UTF-8.  It's sure that we can introduce such kind of change
for SJIS/EUC-JP, but it's hard to decide for ja_JP.UTF-8.

Overall, we have no way to expand wcwidth() correctly and rightly,
so I think each application should handle the actual font size of characters
instead of using wcwidth().

2009/1/9  <d+deb@vdr.jp>:
> On Fri, Jan 09, 2009 at 01:56:20AM +0900, GOTO Masanori wrote:
>> I don't agree with the concept of "UTF-8-CJK" because it's over
>> exaggerated.  Is it a locale dependent issue, or character encoding
>> issue?
>
> I treat ``UTF-8-CJK'' locale as just workaround.
> Nothing could be better than using only UTF-8 locale.
>
>> According to UAX#11, your point doesn't make sense because your
>> reference just mention about character mapping.  Instead, "When
>> processing or displaying data" section says,
>>
>> "Ambiguous characters behave like wide or narrow characters depending
>> on the context (language tag, script identification, associated font,
>> source of data, or explicit markup; all can provide the context). If
>> the context cannot be established reliably, they should be treated as
>> narrow characters by default."
>
> I see.
>
>> If the all legacy applications use wcwidth() supposing the width of
>> ambiguous font size = 2, it's OK to introduce your idea - but I'm not
>> sure it's true or not.
>>
>> Font rendering application should basically consider the font size.
>> Why doesn't rxvt consider about such font rendering size?  Or should
>> we introduce special environment variable or locale tag to decide the
>> behavior of wcwidth value for ambiguous characters?
>
> For legacy applications' concern, it is good that selectable.
> If legacy applications are negligible,
> settled setting is better for users' convenience.
> But libc and locales are so fundamental,
> it is possible that its rigidity is a cause of concern...
>
> Regards,
>        dai
> --
>
>
>
> --
> To UNSUBSCRIBE, email to debian-glibc-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>



Reply to: