[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)



On Tue, Jan 13, 2004 at 06:12:19PM +0900, Tomohiro KUBOTA wrote:
[...]
> However, in the reality, the page is written in EUC-JP.
> Because of this inconsistency, web browsers will render the page
> by assuming the page is UTF-8 and the result will be the Mojibake.
[...]

I do not know how packages.debian.org is generated, but this mismatch
may be due to how Japanese pages are encoded under webwml/japanese/.
Japanese translators decided to generate HTML pages with ISO-2022-JP
encoding, and because this is their preferred encoding, .wml files
are ISO-2022-JP as well.  Unfortunately, WML cannot handle this
encoding, so .wml files are preprocessed and recoded into EUC-JP, then
normal WML processing takes place, and output files are converted back
to ISO-2022-JP.
For this reason, mojibake occurs also for all index files which include
material from other pages (/events, /News, etc).  I am going to commit
a fix.

Denis



Reply to: