[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#959474: Issues with Chinese language (all variants) when building some pages in buster



On Mon, May 04, 2020 at 10:19:02PM -0400, Boyuan Yang wrote:
> Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it
> could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the
> email soon.
> 

Hi,

(I used reportbug to handle reply of this thread, and I missed a lot of
recipients here.  This is a resend of reply in #959474.  Sorry for the
noise.)

After a bit of investigation of Perl source code (5.31.11 downloaded
from upstream) I found the they have weird handling of whitespace when
`feature unicode_strings` turned on.  I am not a perl person and I
haven't executed the source code yet, so my interpretation might be
wrong.

When `unicode_strings` is on, `in_uni_8_bit` should true internally, and
in three places of pp.c:6040, pp.c:6076, pp.c:6114 `isSPACE_L1` is
called to check whether the examining character is a whitespace, by
checking whether the character is 0x85 or 0xA0 (handy.h:1611).  In the
case of the character 包, the last byte of 3-byte UTF-8 code is 0x85,
henceforth the problem.

Attachment: signature.asc
Description: PGP signature


Reply to: