[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of new search engine



Hi,

From: csmall@enc.com.au (Craig Small)
Subject: Status of new search engine
Date: Tue, 17 Dec 2002 19:30:54 +1100

> Hello,
>   After that last request I decided it was time to get serious about the
> searching and have now got it going, sort of.

Thank you for your efforts.

>   - This one is using UTF-8, it should mean the charsets that were not
>     supported before are now, including 2byte ones.

For a Japanese speaker like me, this IS everything.  I am really
looking forward this will be available.


I tested the new page.


1.

I searched "kubota", which is my name, in "any" languages.
There are 150 results.  The second one is Japanese page,
   http://www.debian.org/devel/website/translation_coordinators.ja.html
However, the result-showing page displays the title and so on
brokenly.  I found the search page http://search.debian.org/new/search.cgi
have the following line:
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
which is apparently wrong.  The page should be utf-8.  Of course, the
result items must be converted into UTF-8.  (Since the search result
page has to display various languages in one page, the encoding must
be UTF-8.)


2.

I searched "Linux" in Language: "Any".  It results in 112950 hits.
However, in any other languages such as "English", "German", and
"Japanese", the result was zero.


3.

I searched a Japanese word which means "security" in Language: "Any".
The result was zero.

I checked the URL of the result page.  It was:

http://search.debian.org/new/search.cgi?q=%A5%BB%A5%AD%A5%E5%A5%EA%A5%C6%A5%A3&ps=10&o=0&m=and&lang=

The variable "q" seems to be the word in EUC-JP encoding.  (I used a text
browser "w3m" in Debian Sid, ja_JP.eucJP locale).

On the other hand, when I used Microsoft Windows and Internet Explorer,
the URL was:

http://search.debian.org/new/search.cgi?q=%26%2312475%3B%26%2312461%3B%26%2312517%3B%26%2312522%3B%26%2312486%3B%26%2312451%3B&ps=10&o=0&m=and&lang=

I don't know the value of the variable "q".

Thus, it might be difficult to handle various international input
from webforms.  However, I believe it is possible because Google
does it well.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: