[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: search.debian.org is online



Hi,

Some additions.

From: Tomohiro KUBOTA <debian@tmail.plala.or.jp>
Subject: Re: search.debian.org is online
Date: Mon, 30 Dec 2002 19:53:31 +0900 (JST)

> > > Note that, if this problem is fixed, Korean people will benefit very
> > > much even if the word-separation problem is not fixed.
> > I don't understand.  Are you saying that Korean uses two-byte characters
> > but doesn't have spaces in words and should be ok now?
> 
> The current version of Debian search site has two problems for east Asian
> languages:

I meant that Korean uses two-byte characters but DOES have spaces between
woreds and should be ok now.  (Chinese and Japanese use two-byte characters
and DON'T have spaces between words.)

> However, two-byte search doesn't always fail.  For example, I reported
> in http://lists.debian.org/debian-www/2002/debian-www-200212/msg00256.html
> that I can search my name.  I guess the condition when a search succeeds
> or fails depends on whether the Japanese word is written in normal EUC-JP
> encoding or in HTML "&#xxxx;" expression where xxxx is UTF-8 codepoint.
> When the word is written in "&#xxxx;" expression, the search succeeds
> while the word is written in normal EUC-JP encoding, the search fails.

s/EUC-JP/ISO-2022-jp/

Note that Japanese WML sources are written either in EUC-JP or ISO-2022-JP.
However, Japanesee HTML in Debian web site are all written in ISO-2022-JP.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: