Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Package: www.debian.org
Severity: normal
User: www.debian.org@packages.debian.org
Usertags: scripts
X-Debbugs-CC: debian-l10n-chinese@lists.debian.org
X-Debbugs-CC: debian-i18n@lists.debian.org
Hi all,
TL;DR
There are some issues with some Chinese pages when they are built in a
buster machine.
We need to fix those issues (at least the "Malformed UTF-8 character
[...] at ../../bin/tocn.pl [...]" ones) so DSA can upgrade the
www-master machine to buster. See the summary of the log at the bottom
to know which files produce this error.
I have no idea of how to fix the issues, so any help from the Chinese
team or web team mates is greatly appreciated..
Additional issues may arise (e.g. I still didn't test the release-notes
or doc-manual), any help testing is welcome too, please create bug
reports for each different issue or update the existing ones. Thanks!
LONG VERSION
I've done a test build of the /english and /chinese subdirs in a buster
machine, and I have noticed some warnings/errors related to the Chinese
pages (some, not all of them).
It would be desirable to upgrade www-master machine to buster as soon as
possible, so any help with this (from website or Chinese team members)
is very appreciated.
Below you can find an extract of the build log, including only the the
files for which I got some error or warning message.
After the build, I have compared the problematic HTML files of a build
in stretch and a build in buster with a diff tool, to see if there were
significant changes in the html output due to these issues.
Here are my results:
* For the messages of the type ", [zh_TW]Invalid UTF8: " when building,
I couldn't note any difference between the output of a stretch build and
the output of a buster build.
I would say this is not a blocker for the buster upgrade of www-master.
* For the messages of the type "Malformed UTF-8 character [...] at
../../bin/tocn.pl [...]" I have seen important changes in the HTML diff,
I think the output in the stretch build is totally broken (fortunately,
there are not many files in that situation).
I would say this is a blocker for the buster upgrade of www-master, but
I would prefer somebody of the Chinese team to confirm (try to build
those files in a buster machine, and review the output).
Additional notes:
* I have only tested the wml build, not the rest of the cron scripts
that run on www-master. I will try to do it in the following days, but
if you already know any that works well (e.g. release-notes,
doc-manuals...) just tell so I can skip them.
* When I build files in my machines, there is something wrong in my
environment that I don't get the .po files integrated every time, so for
example the Chinese pages I build show the menus and footnote in
English. Therefore, if there is any issue with the encoding of the .po
files themselves, I guess I cannot detect it until I fix my particular
issue :/
* The local build that I make uses the SAMPLE_FILES that are needed in
some folders; so additional issues may arise when we use the actual
files that are generated at runtime in the often and lessoften cron jobs.
That's all for now, I think. Thanks for your patience reading and for
your help!
Kind regards,
--
Laura Arjona Reina
https://wiki.debian.org/LauraArjona
--- extract of the build log file
/chinese
Processing
donations.wml:
[zh_CN]Invalid UTF8:
ïŒç¹å»âæ·»å å°èŽç©èœŠâïŒç¶åå®æå©äœè¿çšã
, [zh_TW]Invalid UTF8:
ïŒç¹å»âæ·»å å°èŽç©èœŠâïŒç¶åå®æå©äœè¿çšã
, [zh_HK]Invalid UTF8:
ïŒç¹å»âæ·»å å°èŽç©èœŠâïŒç¶åå®æå©äœè¿çšã
.
make[1]: Entering directory '/webwml/chinese/Bugs'
Processing Reporting.wml: [zh_CN]Invalid UTF8:
°äžæ¬¡ç€ºäŸäŒè¯çè¿çšã</li>
, [zh_TW]Invalid UTF8: °äžæ¬¡ç€ºäŸäŒè¯çè¿çšã</li>
, [zh_HK]Invalid UTF8: °äžæ¬¡ç€ºäŸäŒè¯çè¿çšã</li>
.
make[2]: Entering directory '/webwml/chinese/News/2000'
Processing 20000815.wml:
[zh_CN]Invalid UTF8: µ·å€æåçéŒååå©ïŒå
æ¬ïŒ
, [zh_TW]Invalid UTF8: µ·å€æåçéŒååå©ïŒå
æ¬ïŒ
, [zh_HK]Invalid UTF8: µ·å€æåçéŒååå©ïŒå
æ¬ïŒ
.
make[2]: Entering directory '/webwml/chinese/News/2009'
Processing 20090214.wml: [zh_CN]Invalid UTF8: Sun SPARC (sparc)ã
, [zh_TW]Invalid UTF8: Sun SPARC (sparc)ã
, [zh_HK]Invalid UTF8: Sun SPARC (sparc)ã
.
make[2]: Entering directory '/webwml/chinese/News/weekly'
copying index.zh-cn.html to ../../../../www/News/weekly/./2002/48
Processing index.wml: [zh_CN]Malformed UTF-8 character (unexpected end
of string) in substitution (s///) at ../../bin/tocn.pl line 13, <> line 146.
Malformed UTF-8 character (unexpected end of string) in substitution
(s///) at ../../bin/tocn.pl line 15, <> line 146.
panic: do_trans_simple_utf8 line 362 at ../../bin/tocn.pl line 20, <>
line 146.
, [zh_TW]Invalid UTF8: å
, [zh_HK]Invalid UTF8: å
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2002/49
copying index.zh-cn.html to ../../../../www/News/weekly/./2003/09
Processing index.wml: [zh_CN]Invalid UTF8: æªæè¿°äºåŸå®è£
, [zh_TW]Invalid UTF8: 件æè¿°äºåŸå®è£
, [zh_HK]Invalid UTF8: 件æè¿°äºåŸå®è£
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2003/10
Processing index.wml: [zh_CN]Invalid UTF8: 们ç<a
href="../../../../events/talks">æŒè®²é¡µé¢</a>æ¥åïŒ
, [zh_TW]Invalid UTF8: 们ç<a
href="../../../../events/talks">æŒè®²é¡µé¢</a>æ¥åïŒ
, [zh_HK]Invalid UTF8: 们ç<a
href="../../../../events/talks">æŒè®²é¡µé¢</a>æ¥åïŒ
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2012/15
make[1]: Entering directory '/webwml/chinese/devel'
Processing
testing.wml:
[zh_CN],
[zh_TW]Invalid
UTF8: °äº 4
åäžæç®æŽæ°çè»ä»¶å
ïŒå ç²å®åæç Žå£äŸè³Žã<q>(0)</q> æ¯ç¡
, [zh_HK]Invalid
UTF8: °äº 4
åäžæç®æŽæ°çè»ä»¶å
ïŒå ç²å®åæç Žå£äŸè³Žã<q>(0)</q> æ¯ç¡
.
make[2]: Entering directory '/webwml/chinese/devel/join'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../../bin/tocn.pl line
108, <> line 52.
, [zh_TW], [zh_HK].
copying index.zh-cn.html to ../../../../www/devel/join
copying index.zh-hk.html to ../../../../www/devel/join
copying index.zh-tw.html to ../../../../www/devel/join
make[1]: Entering directory '/webwml/chinese/international'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../bin/tocn.pl line 108,
<> line 89.
, [zh_TW]Invalid UTF8:
çšåº
, [zh_HK]Invalid UTF8:
çšåº
.
make[2]: Entering directory '/webwml/chinese/international/Chinese'
Processing thanks.wml: [zh_CN]Invalid UTF8: «é»çæå
, [zh_TW]Invalid UTF8: «é»çæå
, [zh_HK]Invalid UTF8: «é»çæå
.
make[1]: Entering directory '/webwml/chinese/intro'
Processing about.wml: [zh_CN], [zh_TW], [zh_HK]panic: swash_fetch got
swatch of unexpected bit width, slen=512, needents=64 at ../bin/tohk.pl
line 131, <> line 95.
.
make -C legal install
make[1]: Entering directory '/webwml/chinese/legal'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../bin/tocn.pl line 108,
<> line 68.
, [zh_TW], [zh_HK].
copying index.zh-cn.html to ../../../www/legal
copying index.zh-hk.html to ../../../www/legal
copying index.zh-tw.html to ../../../www/legal
make[1]: Entering directory '/webwml/chinese/releases'
Processing proposed-updates.wml: [zh_CN],
[zh_TW]Invalid UTF8: èœæçµå°é proposed-updates
, [zh_HK]Invalid UTF8: èœæçµå°é proposed-updates
.
make[2]: Entering directory '/webwml/chinese/releases/hamm'
Processing HOWTO.upgrade.wml: [zh_CN], [zh_TW]Malformed UTF-8 character:
\xe5\x8c\x0a (unexpected non-continuation byte 0x0a, 2 bytes after start
byte 0xe5; need 3 bytes, got 2) in substitution (s///) at
../../bin/totw.pl line 111, <> line 71.
, [zh_HK].
Reply to: