[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#959474: Issues with Chinese language (all variants) when building some pages in buster



Package: www.debian.org
Severity: normal
User: www.debian.org@packages.debian.org
Usertags: scripts
X-Debbugs-CC: debian-l10n-chinese@lists.debian.org
X-Debbugs-CC: debian-i18n@lists.debian.org

Hi all,

TL;DR

There are some issues with some Chinese pages when they are built in a
buster machine.
We need to fix those issues (at least the "Malformed UTF-8 character
[...] at ../../bin/tocn.pl [...]" ones) so DSA can upgrade the
www-master machine to buster. See the summary of the log at the bottom
to know which files produce this error.
I have no idea of how to fix the issues, so any help from the Chinese
team or web team mates is greatly appreciated..
Additional issues may arise (e.g. I still didn't test the release-notes
or doc-manual), any help testing is welcome too, please create bug
reports for each different issue or update the existing ones. Thanks!

LONG VERSION

I've done a test build of the /english and /chinese subdirs in a buster
machine, and I have noticed some warnings/errors related to the Chinese
pages (some, not all of them).

It would be desirable to upgrade www-master machine to buster as soon as
possible, so any help with this (from website  or Chinese team members)
is very appreciated.

Below you can find an extract of the build log, including only the the
files for which I got some error or warning message.

After the build, I have compared the problematic HTML files of a build
in stretch and a build in buster with a diff tool, to see if there were
significant changes in the html output due to these issues.

Here are my results:

* For the messages of the type ", [zh_TW]Invalid UTF8: " when building,
I couldn't note any difference between the output of a stretch build and
the output of a buster build.

I would say this is not a blocker for the buster upgrade of www-master.

* For the messages of the type "Malformed UTF-8 character [...] at
../../bin/tocn.pl [...]" I have seen important changes in the HTML diff,
I think the output in the stretch build is totally broken (fortunately,
there are not many files in that situation).

I would say this is a blocker for the buster upgrade of www-master, but
I would prefer somebody of the Chinese team to confirm (try to build
those files in a buster machine, and review the output).

Additional notes:

* I have only tested the wml build, not the rest of the cron scripts
that run on www-master. I will try to do it in the following days, but
if you already know any that works well (e.g. release-notes,
doc-manuals...) just tell so I can skip them.

* When I build files in my machines, there is something wrong in my
environment that I don't get the .po files integrated every time, so for
example the Chinese pages I build show the menus and footnote in
English. Therefore, if there is any issue with the encoding of the .po
files themselves, I guess I cannot detect it until I fix my particular
issue :/

* The local build that I make uses the SAMPLE_FILES that are needed in
some folders; so additional issues may arise when we use the actual
files that are generated at runtime in the often and lessoften cron jobs.

That's all for now, I think. Thanks for your patience reading and for
your help!

Kind regards,
-- 
Laura Arjona Reina
https://wiki.debian.org/LauraArjona


--- extract of the build log file

/chinese

Processing
donations.wml:
[zh_CN]Invalid UTF8:
ïŒŒç‚¹å‡»â€œæ·»åŠ åˆ°èŽ­ç‰©èœŠâ€ïŒŒç„¶åŽå®Œæˆå‰©äœ™è¿‡çš‹ã€‚
, [zh_TW]Invalid UTF8:
ïŒŒç‚¹å‡»â€œæ·»åŠ åˆ°èŽ­ç‰©èœŠâ€ïŒŒç„¶åŽå®Œæˆå‰©äœ™è¿‡çš‹ã€‚
, [zh_HK]Invalid UTF8:
ïŒŒç‚¹å‡»â€œæ·»åŠ åˆ°èŽ­ç‰©èœŠâ€ïŒŒç„¶åŽå®Œæˆå‰©äœ™è¿‡çš‹ã€‚
.

make[1]: Entering directory '/webwml/chinese/Bugs'
Processing Reporting.wml: [zh_CN]Invalid UTF8:
°äž€æ¬¡ç€ºäŸ‹äŒšè¯çš„过皋。</li>
, [zh_TW]Invalid UTF8: °äž€æ¬¡ç€ºäŸ‹äŒšè¯çš„过皋。</li>
, [zh_HK]Invalid UTF8: °äž€æ¬¡ç€ºäŸ‹äŒšè¯çš„过皋。</li>
.

make[2]: Entering directory '/webwml/chinese/News/2000'

Processing 20000815.wml:
[zh_CN]Invalid UTF8: µ·å€–朋友的錎力協助包括
, [zh_TW]Invalid UTF8: µ·å€–朋友的錎力協助包括
, [zh_HK]Invalid UTF8: µ·å€–朋友的錎力協助包括
.

make[2]: Entering directory '/webwml/chinese/News/2009'
Processing 20090214.wml: [zh_CN]Invalid UTF8: šSun SPARC (sparc)、
, [zh_TW]Invalid UTF8: šSun SPARC (sparc)、
, [zh_HK]Invalid UTF8: šSun SPARC (sparc)、
.

make[2]: Entering directory '/webwml/chinese/News/weekly'

copying index.zh-cn.html to ../../../../www/News/weekly/./2002/48
Processing index.wml: [zh_CN]Malformed UTF-8 character (unexpected end
of string) in substitution (s///) at ../../bin/tocn.pl line 13, <> line 146.
Malformed UTF-8 character (unexpected end of string) in substitution
(s///) at ../../bin/tocn.pl line 15, <> line 146.
panic: do_trans_simple_utf8 line 362 at ../../bin/tocn.pl line 20, <>
line 146.
, [zh_TW]Invalid UTF8: å‘
, [zh_HK]Invalid UTF8: å‘
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2002/49

copying index.zh-cn.html to ../../../../www/News/weekly/./2003/09
Processing index.wml: [zh_CN]Invalid UTF8: –‡æª”描述了埞安裝
, [zh_TW]Invalid UTF8: –‡ä»¶æè¿°äº†åŸžå®‰è£
, [zh_HK]Invalid UTF8: –‡ä»¶æè¿°äº†åŸžå®‰è£
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2003/10
Processing index.wml: [zh_CN]Invalid UTF8: ˆ‘们的<a
href="../../../../events/talks">挔讲页面</a>来吗
, [zh_TW]Invalid UTF8: ˆ‘们的<a
href="../../../../events/talks">挔讲页面</a>来吗
, [zh_HK]Invalid UTF8: ˆ‘们的<a
href="../../../../events/talks">挔讲页面</a>来吗
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2012/15

make[1]: Entering directory '/webwml/chinese/devel'

Processing
testing.wml:
[zh_CN],
[zh_TW]Invalid
UTF8: ˆ°äº† 4
個䞍打算曎新的軟件包因爲它們會砎壞䟝賎。<q>(0)</q> 是無
, [zh_HK]Invalid
UTF8: ˆ°äº† 4
個䞍打算曎新的軟件包因爲它們會砎壞䟝賎。<q>(0)</q> 是無
.

make[2]: Entering directory '/webwml/chinese/devel/join'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../../bin/tocn.pl line
108, <> line 52.
, [zh_TW], [zh_HK].
copying index.zh-cn.html to ../../../../www/devel/join
copying index.zh-hk.html to ../../../../www/devel/join
copying index.zh-tw.html to ../../../../www/devel/join

make[1]: Entering directory '/webwml/chinese/international'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../bin/tocn.pl line 108,
<> line 89.
, [zh_TW]Invalid UTF8: …皋序
, [zh_HK]Invalid UTF8: …皋序
.

make[2]: Entering directory '/webwml/chinese/international/Chinese'

Processing thanks.wml: [zh_CN]Invalid UTF8: «™é»žçš„朋友
, [zh_TW]Invalid UTF8: «™é»žçš„朋友
, [zh_HK]Invalid UTF8: «™é»žçš„朋友
.

make[1]: Entering directory '/webwml/chinese/intro'
Processing about.wml: [zh_CN], [zh_TW], [zh_HK]panic: swash_fetch got
swatch of unexpected bit width, slen=512, needents=64 at ../bin/tohk.pl
line 131, <> line 95.
.

make -C legal install
make[1]: Entering directory '/webwml/chinese/legal'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../bin/tocn.pl line 108,
<> line 68.
, [zh_TW], [zh_HK].
copying index.zh-cn.html to ../../../www/legal
copying index.zh-hk.html to ../../../www/legal
copying index.zh-tw.html to ../../../www/legal

make[1]: Entering directory '/webwml/chinese/releases'

Processing proposed-updates.wml: [zh_CN],
[zh_TW]Invalid UTF8: ‰èƒœæœ€çµ‚到達 proposed-updates
, [zh_HK]Invalid UTF8: ‰èƒœæœ€çµ‚到達 proposed-updates
.

make[2]: Entering directory '/webwml/chinese/releases/hamm'
Processing HOWTO.upgrade.wml: [zh_CN], [zh_TW]Malformed UTF-8 character:
\xe5\x8c\x0a (unexpected non-continuation byte 0x0a, 2 bytes after start
byte 0xe5; need 3 bytes, got 2) in substitution (s///) at
../../bin/totw.pl line 111, <> line 71.
, [zh_HK].


Reply to: