[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#959474: Issues with Chinese language (all variants) when building some pages in buster


在 2020-05-05星期二的 03:34 +0200,Axel Beckert写道:
> → echo 包 | perl -pe 's|\s+\n|\n|sg;'
> 包
> → echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
> �
> Which kinda sounds like a Perl bug. Cc'ing the maintainers of Debian's
> perl package (not the whole Debian Perl Team), maybe they have some
> insight what actually goes wrong here and if that's indeed a Perl bug.

I guess it is a Perl bug. I am listing more Chinese characters other than "包"
here that can trigger the problem:

% echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
% echo 赠 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
% echo 传 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
% echo 阅 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
% echo 加 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
% echo 者 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'

% echo -n 赠 | hexdump -C
00000000  e8 b5 a0
% echo -n 传 | hexdump -C
00000000  e4 bc a0
% echo -n 包 | hexdump -C                                        
00000000  e5 8c 85
% echo -n 阅 | hexdump -C
00000000  e9 98 85
% echo -n 加 | hexdump -C
00000000  e5 8a a0
% echo -n 者 | hexdump -C
00000000  e8 80 85

(Note that 0xA0 and 0x85 at the end.)

Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it
could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the
email soon.

Boyuan Yang

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to: