[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of new search engine



Hi,

From: Tomohiro KUBOTA <debian@tmail.plala.or.jp>
Subject: Re: Status of new search engine
Date: Tue, 17 Dec 2002 20:46:36 +0900 (JST)

> I heard that "namazu" can be used for such purpose, i.e., constructing
> a whole-text search engine for Japanese.  It is a free software and
> available as a Debian package.  Namazu is very popular not only among
> Japanese free software community but also among commercial usages.

Sorry, this is not exact.  Namazu is a search engine but it doesn't
have sentence analyzer.  It needs external softwares such as
ChaSen http://chasen.aist-nara.ac.jp/index.html.en or
Kakasi http://kakasi.namazu.org/index.html.en .  If your search engine
has some mechanism to expand word-separation procedure, you may want
to ask someone about these softwares.  I expect there are Debian
members from Japan who know well on these softwares.

There are several languages in the world whose sentence doesn't use
whitespace to separate words.  For example,  "Thereareseverallanguages
intheworldwhosesentencedoesn'tusewhitespacetoseparatewords."  Among
Debian webpage languages, Japanese and Chinese are such languages.
Thai also, though it is not yet available on Debian webpage.
Though Korean is similar to Japanese, modern Korean does use whitespace
between words.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: