[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of new search engine



Hi,

Sorry for sending many mails....

From: csmall@enc.com.au (Craig Small)
Subject: Re: Status of new search engine
Date: Tue, 17 Dec 2002 22:16:51 +1100

> Ah google does it right, let's see then.
> Search for you,which ifthis email client doesn't mangle it should be
> ??? ??  (looks like question marks to me).
> 
> Now, if I pick it up from the search page, I get
> http://search.debian.org/new/search.en.cgi?q=%E4%B9%85%E4%BF%9D%E7%94%B0+%E6%99%BA%E5%BA%83
> and results look sensible.
> 
> I then searched ???????? which is something to do with security
> and got
> http://search.debian.org/new/search.en.cgi?q=%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3%E6%83%85%E5%A0%B1&ps=10&o=0&m=and&lang=
> with no results
> 
> and
> http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3%E6%83%85%E5%A0%B1&btnG=Google+Search
> with lots of results.
> 
> I don't understand why its not giving the right results.

I said that sentence analysis should be the reason, but it may be
wrong.  In such a case, a Japanese word which is accidentally 
separated by whitespace or HTML tags should be searched well.
However, this is not true.

For example, http://www.debian.org/index.en.html has a word "News"
and each translated page has a translated word for "News".

I searched "News" in English, Russian, and Greek, and it worked well.

http://search.debian.org/new/search.cgi?q=News&ps=10&o=0&m=and&lang=

http://search.debian.org/new/search.cgi?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=and&lang=

http://search.debian.org/new/search.cgi?q=%CE%9D%CE%AD%CE%B1&ps=10&o=0&m=and&lang=

On the other hand, I searched "News" in Japanese, Chinese, and Korean,
and the result was zero.

http://search.debian.org/new/search.cgi?q=%E3%83%8B%E3%83%A5%E3%83%BC%E3%82%B9&ps=10&o=0&m=and&lang=

http://search.debian.org/new/search.cgi?q=%EC%83%88%EC%86%8C%EC%8B%9D&ps=10&o=0&m=and&lang=

http://search.debian.org/new/search.cgi?q=%E6%9C%80%E6%96%B0%E6%B6%88%E6%81%AF&ps=10&o=0&m=and&lang=

Note that some Japanese words such as

http://search.debian.org/new/search.cgi?q=%E4%B9%85%E4%BF%9D%E7%94%B0&ps=10&o=0&m=and&lang=

and

http://search.debian.org/new/search.cgi?q=%E6%97%A5%E6%9C%AC%E8%AA%9E&ps=10&o=0&m=and&lang=

are described in &#****; expression in HTML.  (For example,
http://www.debian.org/intl/index.ja.html ).  It comes from
webwml/japanese/po/langs.ja.po .  However, I could not find
where my name (%E4%B9%85%E4%BF%9D%E7%94%B0) comes from.

Thus I imagine that pre-conversion for input for search engine may
have some problem.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: