Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster

To: Damyan Ivanov <dmn@debian.org>, 959761@bugs.debian.org, Boyuan Yang <byang@debian.org>, Holger Wansing <hwansing@mailbox.org>, 959474@bugs.debian.org, Laura Arjona Reina <larjona@debian.org>, debian-l10n-chinese@lists.debian.org, debian-i18n@lists.debian.org, perl@packages.debian.org
Subject: Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
From: gregor herrmann <gregoa@debian.org>
Date: Tue, 5 May 2020 12:16:17 +0200
Message-id: <[🔎] 20200505101617.GP12600@colleen.colgarra.priv.at>
Reply-to: gregor herrmann <gregoa@debian.org>, 959474@bugs.debian.org
In-reply-to: <[🔎] 20200505085328.6jqtzcaxkluhhl6e@sym.noone.org>
References: <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org> <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org> <[🔎] 20200503225739.5484cb41fc877994ebb89ce5@mailbox.org> <[🔎] 295462cf3518b51c2c8b1f516934033748b5159c.camel@debian.org> <[🔎] 20200505010058.tqoss44lmgy5jneh@sym.noone.org> <[🔎] 20200505013426.hf4e2za5xqomz4af@sym.noone.org> <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org> <[🔎] 20200505054510.ndppc4gxea5iwgi7@fbd7c150-3361-11e8-8c11-5badabdd4a8d> <[🔎] 20200505085328.6jqtzcaxkluhhl6e@sym.noone.org> <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org>

On Tue, 05 May 2020 10:53:29 +0200, Axel Beckert wrote:

> > Perhaps the strings in wml need to be decoded from UTF-8 so that they 
> > aren't treated as a sequence of independent bytes?
> ... and would have expect "use feature unicode_strings;" already
> activates all of this.

(I haven't read the thread in detail …).

Personally I often use "use utf8:all" (from libutf8-all-perl) if I'm
reasonably sure that the input is not weird and I want to output
utf-8. It is sometimes a bit slow but handles all the en/decoding in
my experience.
 
> > Explicitly using Encode helps:
> > 
> >  echo 包 | perl -E 'use Encode qw(decode_utf8); while(<>) { $_ = decode_utf8($_); s|\s+\n|\n|sg; print }'
> >  Wide character in print at -e line 1, <> line 1.
> >  包

% time echo 包 | perl -E 'use Encode qw(decode_utf8); while(<>) { $_ = decode_utf8($_); s|\s+\n|\n|sg; print }'
Wide character in print at -e line 1, <> line 1.
包
echo 包  0.00s user 0.00s system 42% cpu 0.002 total
perl -E   0.03s user 0.01s system 97% cpu 0.034 total

% time echo 包 | perl -Mutf8::all -E ' while(<>) { s|\s+\n|\n|sg; print }'
包
echo 包  0.00s user 0.00s system 63% cpu 0.002 total
perl -Mutf8::all -E ' while(<>) { s|\s+\n|\n|sg; print }'  0.04s user 0.01s system 98% cpu 0.050 total

% time echo 包 | perl -CS -E 'while(<>) { s|\s+\n|\n|sg; print }'
包
echo 包  0.00s user 0.00s system 60% cpu 0.002 total
perl -CS -E 'while(<>) { s|\s+\n|\n|sg; print }'  0.00s user 0.00s system 83% cpu 0.005 total


Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   BOFH excuse #378:  Operators killed by year 2000 bug bite.

Reply to:

References:
- Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Laura Arjona Reina <larjona@debian.org>
- Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Holger Wansing <hwansing@mailbox.org>
- Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Boyuan Yang <byang@debian.org>
- Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Axel Beckert <abe@debian.org>
- Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Axel Beckert <abe@debian.org>
- Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Damyan Ivanov <dmn@debian.org>
- Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
  - From: Axel Beckert <abe@debian.org>

Prev by Date: Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Next by Date: Bug#959474: Follow-up fix for wml in Debian Stable?
Previous by thread: Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Next by thread: Processed (with 1 error): Re: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Index(es):
- Date
- Thread