[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of new search engine



On Tue, Dec 17, 2002 at 10:29:49PM +0900, Tomohiro KUBOTA wrote:
> From: Tomohiro KUBOTA <debian@tmail.plala.or.jp>
> Subject: Re: Status of new search engine
> Date: Tue, 17 Dec 2002 20:46:36 +0900 (JST)
> 
> > I heard that "namazu" can be used for such purpose, i.e., constructing
> > a whole-text search engine for Japanese.  It is a free software and
> > available as a Debian package.  Namazu is very popular not only among
> > Japanese free software community but also among commercial usages.
> 
> Sorry, this is not exact.  Namazu is a search engine but it doesn't
> have sentence analyzer.  It needs external softwares such as
> ChaSen http://chasen.aist-nara.ac.jp/index.html.en or
> Kakasi http://kakasi.namazu.org/index.html.en .  If your search engine
> has some mechanism to expand word-separation procedure, you may want
> to ask someone about these softwares.  I expect there are Debian
> members from Japan who know well on these softwares.

Namazu is ok but has some problems which don't make it appropriate for
the Debian website, the complete lack of english documentation (when
I was looking at it) for starters.

The mnogosearch people are aware of this problem and say that 3.2.8
will have proper support and that it uses chasen which, although I'm
not fully up on Japanese charsets, means it will understand it
properly.

> There are several languages in the world whose sentence doesn't use
> whitespace to separate words.  For example,  "Thereareseverallanguages
That's the problem, its breaking the Japanese words up, you can see that
with security it thinks it is two words instead of one.

For those worried about language selection, a strace has found a bug in
the CGI around where it should be looking for the language selection
files.

I've now completely indexed security, devel and dist with News being
done and a fair bit of others complete too. Over 30,000 known pages,
5,000 to go.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/                <csmall@enc.com.au>
MIEEE <csmall@ieee.org>                 Debian developer <csmall@debian.org>



Reply to: