http://www.debian.org/sitemap.ja.html generation error etc.
Hi,
I have made change to sitemap and dwn generation script in such way it
fix Japanese pages as follows. Please rebuild pages if needed and poke
me if I made mistakes.
cvs diff looks good and commited.
/cvsroot/webwml/webwml/english/sitemap.wml,v <-- sitemap.wml
new revision: 1.41; previous revision: 1.40
/cvsroot/webwml/webwml/english/News/weekly/dwn-to-rdf.pl,v <-- News/weekly/dwn-to-rdf.pl
new revision: 1.11; previous revision: 1.10
=================================================
I thought we did good UTF-8 transition after regenerating some news
pages. Alas... I found issue.
http://www.debian.org/sitemap.ja.html
Each line end with "ESC ( B" sequence.
This is ISO2202( http://en.wikipedia.org/wiki/ISO/IEC_2022 ) code
sequence indicating switch to ASCII (1 byte per character).
It must have made sense when this page used 7 bit ISO2202 but it does
not make sense.
I do not know how to fix it. I have japanese/.wmlrc updated as:
-D CUR_LANG=Japanese
-D CUR_ISO_LANG=ja
-D CUR_LOCALE=ja_JP.UTF-8
-D CHARSET=utf-8
-D HOME~.
-D INTRO~intro
-D DEVEL~devel
-D DOC~doc
-D DISTRIB~distrib
-D MISC~misc
-D BUGS~Bugs
-D PICS~Pics
-D STYLE~style
-D VOTE~vote
This code is clearly added by webwml when generating sitemap.ja.html
from each file header.
.... aha... sitemap.wml has funny special case. I am removing it now.
I checked english source for "grep -R "Japanese" *"
english/News/weekly/dwn-to-rdf.pl has funny encoded Japanese text too.
It is in EUCJP. It should be "セキュリティ上の更新。" in UTF-8.
This is difficult to edit since it is mixed encoding file. Since Vim is
too smart for this, I used 8-bit-dumb-clean editor mcedit.
Osamu
Reply to: