Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
- To: Damyan Ivanov <dmn@debian.org>, 959761@bugs.debian.org, Boyuan Yang <byang@debian.org>, Holger Wansing <hwansing@mailbox.org>, 959474@bugs.debian.org, Laura Arjona Reina <larjona@debian.org>, debian-l10n-chinese@lists.debian.org, debian-i18n@lists.debian.org, perl@packages.debian.org
- Subject: Bug#959474: Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
- From: gregor herrmann <gregoa@debian.org>
- Date: Tue, 5 May 2020 12:16:17 +0200
- Message-id: <[🔎] 20200505101617.GP12600@colleen.colgarra.priv.at>
- Reply-to: gregor herrmann <gregoa@debian.org>, 959474@bugs.debian.org
- In-reply-to: <[🔎] 20200505085328.6jqtzcaxkluhhl6e@sym.noone.org>
- References: <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org> <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org> <[🔎] 20200503225739.5484cb41fc877994ebb89ce5@mailbox.org> <[🔎] 295462cf3518b51c2c8b1f516934033748b5159c.camel@debian.org> <[🔎] 20200505010058.tqoss44lmgy5jneh@sym.noone.org> <[🔎] 20200505013426.hf4e2za5xqomz4af@sym.noone.org> <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org> <[🔎] 20200505054510.ndppc4gxea5iwgi7@fbd7c150-3361-11e8-8c11-5badabdd4a8d> <[🔎] 20200505085328.6jqtzcaxkluhhl6e@sym.noone.org> <[🔎] 15d8a46a-6264-2bb7-d952-f4deaa7a38ef@debian.org>
On Tue, 05 May 2020 10:53:29 +0200, Axel Beckert wrote:
> > Perhaps the strings in wml need to be decoded from UTF-8 so that they
> > aren't treated as a sequence of independent bytes?
> ... and would have expect "use feature unicode_strings;" already
> activates all of this.
(I haven't read the thread in detail …).
Personally I often use "use utf8:all" (from libutf8-all-perl) if I'm
reasonably sure that the input is not weird and I want to output
utf-8. It is sometimes a bit slow but handles all the en/decoding in
my experience.
> > Explicitly using Encode helps:
> >
> > echo 包 | perl -E 'use Encode qw(decode_utf8); while(<>) { $_ = decode_utf8($_); s|\s+\n|\n|sg; print }'
> > Wide character in print at -e line 1, <> line 1.
> > 包
% time echo 包 | perl -E 'use Encode qw(decode_utf8); while(<>) { $_ = decode_utf8($_); s|\s+\n|\n|sg; print }'
Wide character in print at -e line 1, <> line 1.
包
echo 包 0.00s user 0.00s system 42% cpu 0.002 total
perl -E 0.03s user 0.01s system 97% cpu 0.034 total
% time echo 包 | perl -Mutf8::all -E ' while(<>) { s|\s+\n|\n|sg; print }'
包
echo 包 0.00s user 0.00s system 63% cpu 0.002 total
perl -Mutf8::all -E ' while(<>) { s|\s+\n|\n|sg; print }' 0.04s user 0.01s system 98% cpu 0.050 total
% time echo 包 | perl -CS -E 'while(<>) { s|\s+\n|\n|sg; print }'
包
echo 包 0.00s user 0.00s system 60% cpu 0.002 total
perl -CS -E 'while(<>) { s|\s+\n|\n|sg; print }' 0.00s user 0.00s system 83% cpu 0.005 total
Cheers,
gregor
--
.''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
: :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
`. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
`- BOFH excuse #378: Operators killed by year 2000 bug bite.
Reply to: