Re: Translators! What's your charset?

To: debian-www@lists.debian.org
Subject: Re: Translators! What's your charset?
From: Fumitoshi UKAI <ukai@debian.or.jp>
Date: Thu, 12 Oct 2000 12:08:58 +0900
Message-id: <[🔎] 87d7h6lprp.wl@lichee.ukai.org>
In-reply-to: In your message of "Wed, 11 Oct 2000 14:33:30 +1100" <[🔎] 20001011143330.A22138@eye-net.com.au>
References: <[🔎] 20001010230504.A12895@eye-net.com.au> <[🔎] 20001010232933.B2621@debian.org> <[🔎] 20001011143330.A22138@eye-net.com.au>

At Wed, 11 Oct 2000 14:33:30 +1100,
Craig Small <csmall@eye-net.com.au> wrote:

> I cannot do the following ones until I get a charset:

> jp - iso-2022-jp
  `ja' if you refere language code, not country code.

Anyway, iso-2022-jp is not so easy.
It uses US-ASCII chars and escape sequences, such as "ESC $ B".
In iso-2022-jp, if byte sequence encounter "ESC $ B" or "ESC $ @", 
then byte sequence switch to JIS X 0208 characters, which are 94x94 
that means 2 byte char until "ESC ( B" or "ESC ( J", which means
to back US-ASCII (or JIS X0201 Roman)
So we can't say which byte set make words in iso-2022-jp encoding.

For more information, you can see RFC1468: Japanese Character Encoding 
for Internet Messages http://www.rfc-editor.org/rfc/rfc1468.txt

> For the dual-byte folks, I don't think this will work.  The upstream
> author is willing to work with you, but he's not sure how to do it.
> Actually it may work... if you put both bytes into the charset.
> Depends on what your whitespace looks like.

Whitespace in iso-2022-jp is 0x20 and sequence "0x21 0x21" ("!!") 
between "0x1B 0x24 0x42" ("ESC $ B") or "0x1B 0x24 0x40" ("ESC $ @")
and "0x1B 0x28 0x42" ("ESC ( B") or "0x1B 0x28 0x4A" ("ESC ( J")
However, Japanese language doesn't use whitespace to separate words.
This is why we need a tool such as chasen - Japanese Morphological 
Analysis System.

Regards,
Fumitoshi UKAI

Reply to:

References:
- Translators! What's your charset?
  - From: csmall@eye-net.com.au (Craig Small)
- Re: Translators! What's your charset?
  - From: Nicolás Lichtmaier <nick@debian.org>
- Re: Translators! What's your charset?
  - From: csmall@eye-net.com.au (Craig Small)

Prev by Date: Re: current ballot must be withdrawn per Constitution A.5
Next by Date: Search button
Previous by thread: Re: Translators! What's your charset?
Next by thread: Who changed vendors.CD?
Index(es):
- Date
- Thread