[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#227273: packages.debian.org: charset mismatch (always in UTF-8?)



Hi,

> Hmm, I have difficulties to understand what you mean. I will try to
> formulate your report in my own words:
> Currently the Japanese pages are served in UTF-8 (is this right?),
> but you request that we serve it in iso-2022-jp instead, because
> UTF-8 causes problems in reading the pages.
> 
> Have I understood you correctly?

I will explain.  (I am the author of the "Mojibake" page which
Yamane-san introduced.)

For example, please see

   http://packages.debian.org/stable/misc/language-env.ja.html

The HTML source of the page says the page is written in UTF-8.
(The 6th line of

   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

).  However, in the reality, the page is written in EUC-JP.
Because of this inconsistency, web browsers will render the page
by assuming the page is UTF-8 and the result will be the Mojibake.

The main point of the problem is this inconsistency.  Thus, at
least, this inconsistency must be fixed.

There are several ways to fix this problem.

(a) Change the encoding of the page to UTF-8 (to match the 6th line).
(b) Change the 6th line to EUC-JP (to match the real content).
(c) Change both the 6th line and the encoding of the page to some
    other encoding (for example ISO-2022-JP).

Yamane-san asks to choose the solution (c).  This is because
ISO-2022-JP is the best encoding for Japanese web page because
of the least possibility to Mojibake even when web browsers cannot
understand the 6th line.  I agree that (c) is the best solution
but I don't think (a) and (b) are unacceptable at all.

I think EUC-JP will be acceptable (solution (b)), because recent
web browsers are likely to understand the 6th line.  (However,
UTF-8 (solution (a)) should be avoided if possible, because some
browsers such as w3m (popular in Japan) cannot handle UTF-8.)

In short, my opinion is:
(c) is the best solution.
(b) has no problem, too.
(a) should be avoided if possible.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/



Reply to: