[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Sitemap" webpage



Hi,

At Thu, 5 Jul 2001 17:36:39 +0100,
David Starner <dstarner98@aasaa.ofe.org> wrote:

> Doesn't ISO-2022-JP have a form that invokes JIS X 0208 into the upper half?
> Could SJIS be used instead?

No.


Additional explanations about real state of Japanese encodings:

There are three popular encodings for Japanese web pages -- ISO-2022-JP,
Shift_JIS, and EUC-JP.  ISO-2022-JP is a 7bit stateful (i.e., having a
state which is changed by escape sequence) encoding while Shift_JIS
and EUC-JP are 8bit stateless encodings.  Web browsers sometimes have
to automatically investigate the encoding of the web pages to be displayed.

Since Shift_JIS and EUC-JP share many codepoints, web browsers are
sometimes confused.  On the other hand, ISO-2022-JP is a self-evident
encoding and browsers cannot be confused.

Note that new web browsers which understand <META HTTP-EQUIV="Content-
Type" CONTENT="text/html; charset=foobar"> will NOT be confused by
any encodings.

Thus, migration into EUC-JP may be a solution.  Shift_JIS can also be.
(Not having JIS X 0212 is a little problem.  However, 0x40-0x7e can
appear for the second byte of doublebyte character.  Though 0x22 is
safe, the range includes 0x5c, i.e., backslash.)  UTF-8 is not popular
yet and some browsers may fail to display, though I think the situation
will change in five or ten years.


On the other hand, better wml handling may be an another (and better)
solution.  Though I don't know well about wml parser, I think it is
possible because title for each page in the sitemap has no problem.
(I.e., "ports/" item in sitemap page is broken while the title of
"ports/" page is good.)  Thus, I expect someone who is familiar with
wml programming can find this solution.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: