[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#826256: locales: wrong width for hexagrams (and possibly others) in 2.22



control: tag -1 + moreinfo

On 2016-06-03 19:29, Thorsten Glaser wrote:
> Package: locales
> Version: 2.22-0experimental0
> Severity: normal
> Tags: upstream
> 
> Starting with locales 2.22-0experimental0, some chars have the wrong
> width; downgrading locales to 2.21-9 fixes the bugs.
> 
> Test program:
> 
> tglase@tglase:~ $ cat x.c
> #define _XOPEN_SOURCE
> #include <locale.h>
> #include <stdio.h>
> #include <wchar.h>
> 
> #define D(x) printf("%04X %d\n",(x),wcwidth(x))
> 
> int
> main(void)
> {
> 	setlocale(LC_ALL, "");
> 
> 	D(0x41);
> 	D(0x0300);
> 	D(0x3000);
> 	D(0x4DC0);
> 	D(0xFFFD);
> 	return (0);
> }
> tglase@tglase:~ $ gcc x.c
> tglase@tglase:~ $ rm -rf tloc; mkdir tloc                                                                  
> tglase@tglase:~ $ localedef -i en_US -c -f UTF-8 tloc/en_US.UTF-8                                          
> tglase@tglase:~ $ LOCPATH=$PWD/tloc LC_ALL=en_US.UTF-8 ./a.out                                             
> 0041 1
> 0300 0
> 3000 2
> 4DC0 1
> FFFD 1
> 
> Output while locales_2.21-9_all.deb was installed during localedef:
> 
> tglase@tglase:~ $ LOCPATH=$PWD/tlocx LC_ALL=en_US.UTF-8 ./a.out                                            
> 0041 1
> 0300 0
> 3000 2
> 4DC0 2
> FFFD 1
> 
> This is because /usr/share/i18n/charmaps/UTF-8.gz now lacks
> entries for 4DC0‥4FFF.
> 
> According to my own code implementing Unicode in another operating
> system, with focus on wcwidth(3), after parsing EastAsianWidth.txt
> special handling is needed to set widths of 0xFF00, 0x3248‥0x324F,
> and 0x4DC0‥0x4DFF to “wide”, as they’re “neutral” normally – which

EastAsian.txt explicitly lists the hexagrams as neutral width, so I don't
think there is a bug there. Version from unicode 3.0 and earlier didn't
specify those characters, and the behaviour from glibc 2.21 is probably
coming from there and is probably wrong.

Looking at the behaviour from other systems, freebsd and netbsd both
return -1 here, while openbsd returns 1. None of them returns 2.

Therefore, can you please give a pointer explaining while the width
should be 2 instead of 1?

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net


Reply to: