[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of new search engine


Sorry for sending many mails....

From: csmall@enc.com.au (Craig Small)
Subject: Re: Status of new search engine
Date: Tue, 17 Dec 2002 22:16:51 +1100

> Ah google does it right, let's see then.
> Search for you,which ifthis email client doesn't mangle it should be
> ??? ??  (looks like question marks to me).
> Now, if I pick it up from the search page, I get
> http://search.debian.org/new/search.en.cgi?q=%E4%B9%85%E4%BF%9D%E7%94%B0+%E6%99%BA%E5%BA%83
> and results look sensible.
> I then searched ???????? which is something to do with security
> and got
> http://search.debian.org/new/search.en.cgi?q=%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3%E6%83%85%E5%A0%B1&ps=10&o=0&m=and&lang=
> with no results
> and
> http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3%E6%83%85%E5%A0%B1&btnG=Google+Search
> with lots of results.
> I don't understand why its not giving the right results.

I said that sentence analysis should be the reason, but it may be
wrong.  In such a case, a Japanese word which is accidentally 
separated by whitespace or HTML tags should be searched well.
However, this is not true.

For example, http://www.debian.org/index.en.html has a word "News"
and each translated page has a translated word for "News".

I searched "News" in English, Russian, and Greek, and it worked well.




On the other hand, I searched "News" in Japanese, Chinese, and Korean,
and the result was zero.




Note that some Japanese words such as




are described in &#****; expression in HTML.  (For example,
http://www.debian.org/intl/index.ja.html ).  It comes from
webwml/japanese/po/langs.ja.po .  However, I could not find
where my name (%E4%B9%85%E4%BF%9D%E7%94%B0) comes from.

Thus I imagine that pre-conversion for input for search engine may
have some problem.

Tomohiro KUBOTA <kubota@debian.org>

Reply to: