Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Laura Arjona Reina <email@example.com> wrote:
> There are some issues with some Chinese pages when they are built in a
> buster machine.
> We need to fix those issues (at least the "Malformed UTF-8 character
> [...] at ../../bin/tocn.pl [...]" ones) so DSA can upgrade the
> www-master machine to buster. See the summary of the log at the bottom
> to know which files produce this error.
> I have no idea of how to fix the issues, so any help from the Chinese
> team or web team mates is greatly appreciated..
> Additional issues may arise (e.g. I still didn't test the release-notes
> or doc-manual), any help testing is welcome too, please create bug
> reports for each different issue or update the existing ones. Thanks!
> LONG VERSION
> I've done a test build of the /english and /chinese subdirs in a buster
> machine, and I have noticed some warnings/errors related to the Chinese
> pages (some, not all of them).
> It would be desirable to upgrade www-master machine to buster as soon as
> possible, so any help with this (from website or Chinese team members)
> is very appreciated.
> Below you can find an extract of the build log, including only the the
> files for which I got some error or warning message.
> After the build, I have compared the problematic HTML files of a build
> in stretch and a build in buster with a diff tool, to see if there were
> significant changes in the html output due to these issues.
> Here are my results:
> * For the messages of the type ", [zh_TW]Invalid UTF8: " when building,
> I couldn't note any difference between the output of a stretch build and
> the output of a buster build.
> I would say this is not a blocker for the buster upgrade of www-master.
Don't know what I did different than Laura, but here some of the built html files
with "Invalid UTF8: ... " messages are lacking much of the content, compared
to the one currently at www-master.
So maybe they are also serious.
> * For the messages of the type "Malformed UTF-8 character [...] at
> ../../bin/tocn.pl [...]" I have seen important changes in the HTML diff,
> I think the output in the stretch build is totally broken (fortunately,
> there are not many files in that situation).
> I would say this is a blocker for the buster upgrade of www-master, but
> I would prefer somebody of the Chinese team to confirm (try to build
> those files in a buster machine, and review the output).
Maybe someone from the chinese people can solve this, but if not, I want
to propose a possible (temporary) solution:
If I delete the files below from the webwml/chinese tree, I can build
chinese without any errors. So, probably we can go with a workaround like this:
delete this files, to remove these upgrade blockers out of the way, upgrade
wolkenstein to buster, and then try to re-add the files step-by-step, maybe
with some modifications at some point, to get the original situation back.
Holger Wansing <firstname.lastname@example.org>
PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076
To get rid of "Invalid UTF8: ï¿½ ..." or "Malformed UTF-8 character ..." build
errors, delete this files from chinese: