[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#959474: marked as done (Issues with Chinese language (all variants) when building some pages in buster)



Your message dated Thu, 07 Nov 2024 14:58:37 -0500
with message-id <48d66a077ed058fed8d30f5bf3822dddc983bfb2.camel@debian.org>
and subject line Re: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
has caused the Debian Bug report #959474,
regarding Issues with Chinese language (all variants) when building some pages in buster
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
959474: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=959474
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: www.debian.org
Severity: normal
User: www.debian.org@packages.debian.org
Usertags: scripts
X-Debbugs-CC: debian-l10n-chinese@lists.debian.org
X-Debbugs-CC: debian-i18n@lists.debian.org

Hi all,

TL;DR

There are some issues with some Chinese pages when they are built in a
buster machine.
We need to fix those issues (at least the "Malformed UTF-8 character
[...] at ../../bin/tocn.pl [...]" ones) so DSA can upgrade the
www-master machine to buster. See the summary of the log at the bottom
to know which files produce this error.
I have no idea of how to fix the issues, so any help from the Chinese
team or web team mates is greatly appreciated..
Additional issues may arise (e.g. I still didn't test the release-notes
or doc-manual), any help testing is welcome too, please create bug
reports for each different issue or update the existing ones. Thanks!

LONG VERSION

I've done a test build of the /english and /chinese subdirs in a buster
machine, and I have noticed some warnings/errors related to the Chinese
pages (some, not all of them).

It would be desirable to upgrade www-master machine to buster as soon as
possible, so any help with this (from website  or Chinese team members)
is very appreciated.

Below you can find an extract of the build log, including only the the
files for which I got some error or warning message.

After the build, I have compared the problematic HTML files of a build
in stretch and a build in buster with a diff tool, to see if there were
significant changes in the html output due to these issues.

Here are my results:

* For the messages of the type ", [zh_TW]Invalid UTF8: " when building,
I couldn't note any difference between the output of a stretch build and
the output of a buster build.

I would say this is not a blocker for the buster upgrade of www-master.

* For the messages of the type "Malformed UTF-8 character [...] at
../../bin/tocn.pl [...]" I have seen important changes in the HTML diff,
I think the output in the stretch build is totally broken (fortunately,
there are not many files in that situation).

I would say this is a blocker for the buster upgrade of www-master, but
I would prefer somebody of the Chinese team to confirm (try to build
those files in a buster machine, and review the output).

Additional notes:

* I have only tested the wml build, not the rest of the cron scripts
that run on www-master. I will try to do it in the following days, but
if you already know any that works well (e.g. release-notes,
doc-manuals...) just tell so I can skip them.

* When I build files in my machines, there is something wrong in my
environment that I don't get the .po files integrated every time, so for
example the Chinese pages I build show the menus and footnote in
English. Therefore, if there is any issue with the encoding of the .po
files themselves, I guess I cannot detect it until I fix my particular
issue :/

* The local build that I make uses the SAMPLE_FILES that are needed in
some folders; so additional issues may arise when we use the actual
files that are generated at runtime in the often and lessoften cron jobs.

That's all for now, I think. Thanks for your patience reading and for
your help!

Kind regards,
-- 
Laura Arjona Reina
https://wiki.debian.org/LauraArjona


--- extract of the build log file

/chinese

Processing
donations.wml:
[zh_CN]Invalid UTF8:
ïŒŒç‚¹å‡»â€œæ·»åŠ åˆ°èŽ­ç‰©èœŠâ€ïŒŒç„¶åŽå®Œæˆå‰©äœ™è¿‡çš‹ã€‚
, [zh_TW]Invalid UTF8:
ïŒŒç‚¹å‡»â€œæ·»åŠ åˆ°èŽ­ç‰©èœŠâ€ïŒŒç„¶åŽå®Œæˆå‰©äœ™è¿‡çš‹ã€‚
, [zh_HK]Invalid UTF8:
ïŒŒç‚¹å‡»â€œæ·»åŠ åˆ°èŽ­ç‰©èœŠâ€ïŒŒç„¶åŽå®Œæˆå‰©äœ™è¿‡çš‹ã€‚
.

make[1]: Entering directory '/webwml/chinese/Bugs'
Processing Reporting.wml: [zh_CN]Invalid UTF8:
°äž€æ¬¡ç€ºäŸ‹äŒšè¯çš„过皋。</li>
, [zh_TW]Invalid UTF8: °äž€æ¬¡ç€ºäŸ‹äŒšè¯çš„过皋。</li>
, [zh_HK]Invalid UTF8: °äž€æ¬¡ç€ºäŸ‹äŒšè¯çš„过皋。</li>
.

make[2]: Entering directory '/webwml/chinese/News/2000'

Processing 20000815.wml:
[zh_CN]Invalid UTF8: µ·å€–朋友的錎力協助包括
, [zh_TW]Invalid UTF8: µ·å€–朋友的錎力協助包括
, [zh_HK]Invalid UTF8: µ·å€–朋友的錎力協助包括
.

make[2]: Entering directory '/webwml/chinese/News/2009'
Processing 20090214.wml: [zh_CN]Invalid UTF8: šSun SPARC (sparc)、
, [zh_TW]Invalid UTF8: šSun SPARC (sparc)、
, [zh_HK]Invalid UTF8: šSun SPARC (sparc)、
.

make[2]: Entering directory '/webwml/chinese/News/weekly'

copying index.zh-cn.html to ../../../../www/News/weekly/./2002/48
Processing index.wml: [zh_CN]Malformed UTF-8 character (unexpected end
of string) in substitution (s///) at ../../bin/tocn.pl line 13, <> line 146.
Malformed UTF-8 character (unexpected end of string) in substitution
(s///) at ../../bin/tocn.pl line 15, <> line 146.
panic: do_trans_simple_utf8 line 362 at ../../bin/tocn.pl line 20, <>
line 146.
, [zh_TW]Invalid UTF8: å‘
, [zh_HK]Invalid UTF8: å‘
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2002/49

copying index.zh-cn.html to ../../../../www/News/weekly/./2003/09
Processing index.wml: [zh_CN]Invalid UTF8: –‡æª”描述了埞安裝
, [zh_TW]Invalid UTF8: –‡ä»¶æè¿°äº†åŸžå®‰è£
, [zh_HK]Invalid UTF8: –‡ä»¶æè¿°äº†åŸžå®‰è£
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2003/10
Processing index.wml: [zh_CN]Invalid UTF8: ˆ‘们的<a
href="../../../../events/talks">挔讲页面</a>来吗
, [zh_TW]Invalid UTF8: ˆ‘们的<a
href="../../../../events/talks">挔讲页面</a>来吗
, [zh_HK]Invalid UTF8: ˆ‘们的<a
href="../../../../events/talks">挔讲页面</a>来吗
.
copying index.zh-cn.html to ../../../../www/News/weekly/./2012/15

make[1]: Entering directory '/webwml/chinese/devel'

Processing
testing.wml:
[zh_CN],
[zh_TW]Invalid
UTF8: ˆ°äº† 4
個䞍打算曎新的軟件包因爲它們會砎壞䟝賎。<q>(0)</q> 是無
, [zh_HK]Invalid
UTF8: ˆ°äº† 4
個䞍打算曎新的軟件包因爲它們會砎壞䟝賎。<q>(0)</q> 是無
.

make[2]: Entering directory '/webwml/chinese/devel/join'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../../bin/tocn.pl line
108, <> line 52.
, [zh_TW], [zh_HK].
copying index.zh-cn.html to ../../../../www/devel/join
copying index.zh-hk.html to ../../../../www/devel/join
copying index.zh-tw.html to ../../../../www/devel/join

make[1]: Entering directory '/webwml/chinese/international'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../bin/tocn.pl line 108,
<> line 89.
, [zh_TW]Invalid UTF8: …皋序
, [zh_HK]Invalid UTF8: …皋序
.

make[2]: Entering directory '/webwml/chinese/international/Chinese'

Processing thanks.wml: [zh_CN]Invalid UTF8: «™é»žçš„æœ‹å‹
, [zh_TW]Invalid UTF8: «™é»žçš„æœ‹å‹
, [zh_HK]Invalid UTF8: «™é»žçš„æœ‹å‹
.

make[1]: Entering directory '/webwml/chinese/intro'
Processing about.wml: [zh_CN], [zh_TW], [zh_HK]panic: swash_fetch got
swatch of unexpected bit width, slen=512, needents=64 at ../bin/tohk.pl
line 131, <> line 95.
.

make -C legal install
make[1]: Entering directory '/webwml/chinese/legal'
Processing index.wml: [zh_CN]Malformed UTF-8 character: \xe9\x98\x0a
(unexpected non-continuation byte 0x0a, 2 bytes after start byte 0xe9;
need 3 bytes, got 2) in substitution (s///) at ../bin/tocn.pl line 108,
<> line 68.
, [zh_TW], [zh_HK].
copying index.zh-cn.html to ../../../www/legal
copying index.zh-hk.html to ../../../www/legal
copying index.zh-tw.html to ../../../www/legal

make[1]: Entering directory '/webwml/chinese/releases'

Processing proposed-updates.wml: [zh_CN],
[zh_TW]Invalid UTF8: ‰èƒœæœ€çµ‚到達 proposed-updates
, [zh_HK]Invalid UTF8: ‰èƒœæœ€çµ‚到達 proposed-updates
.

make[2]: Entering directory '/webwml/chinese/releases/hamm'
Processing HOWTO.upgrade.wml: [zh_CN], [zh_TW]Malformed UTF-8 character:
\xe5\x8c\x0a (unexpected non-continuation byte 0x0a, 2 bytes after start
byte 0xe5; need 3 bytes, got 2) in substitution (s///) at
../../bin/totw.pl line 111, <> line 71.
, [zh_HK].

--- End Message ---
--- Begin Message ---
X-Debbugs-CC: cwryu@debian.org

On Tue, 09 Jun 2020 20:44:02 -0400 Boyuan Yang <byang@debian.org> wrote:
> 在 2020-06-07星期日的 21:23 +0200,Laura Arjona Reina写道:
> > Hi
> > 
> > El 7/6/20 a las 16:02, Axel Beckert escribió:
> > 
> > > Just ot be sure: I should still provide a stable update for buster,
> > > right?
> > > 
> > 
> > I don't know if the type of bug qualifies for a stable update.
> 
> If I were the maintainer, I would give it a try to make the stable
> update. (Why not?)
> 
> > For www.debian.org, we'll be using the -O1 workaround for building
> > the
> > Chinese pages, and that's about optimization, we don't lose any
> > functionality, so I think we can wait for bullseye.
> > 
> > Boyuan, please correct me if I am wrong...
> 
> If we have the workaround applied, website building with Chinese
> contents should not be an issue anymore.

Now that the fix from wml software is present since oldoldstable,
I am considering this bug to be fixed. The commit to revert the
-O1 option for Chinese build is at https://salsa.debian.org/webmaster-team/webwml/-/commit/6817ed4ca24fb41283784ab5de2cad9d9676e460 .

Regarding https://bugs.debian.org/959474#93 : I belive the same
option can be removed from the Korean build as well.

Thanks,
Boyuan Yang

Attachment: signature.asc
Description: This is a digitally signed message part


--- End Message ---

Reply to: