Re: Test search engine

To: NOKUBI Takatsugu <knok@daionet.gr.jp>
Cc: debian-www@lists.debian.org
Subject: Re: Test search engine
From: csmall@eye-net.com.au (Craig Small)
Date: Wed, 13 Sep 2000 08:08:00 +1100
Message-id: <[🔎] 20000913080800.A5142@eye-net.com.au>
Mail-followup-to: NOKUBI Takatsugu <knok@daionet.gr.jp>, debian-www@lists.debian.org
In-reply-to: <[🔎] 200009120844.RAA10871@ns1.eal.or.jp>; from knok@daionet.gr.jp on Tue, Sep 12, 2000 at 05:44:37PM +0900
References: <[🔎] 20000908164305.A6425@eye-net.com.au> <[🔎] 200009120844.RAA10871@ns1.eal.or.jp>

On Tue, Sep 12, 2000 at 05:44:37PM +0900, NOKUBI Takatsugu wrote:
> I did not check umdsearch, however, it should need "word segmentation"
> process for some languages (like Japanese).
> There are no space between words in some languages. Therefore a
> boundary of words is not clear in such languages.
> 
> kakasi and chasen can segment Japanese words. I don't know about other
> languages...
The way it determines a word is that it has a character list that makes
up a word, something like [A-Za-z0-9]. A word is a sequence of
characters of: not-in-list, in-list, not-in-list

So is there is some byte sequence that equates to a space, then you make
sure it is not in the character list and udmsearch says that's where the
word ends.

I tried at least one of those mentioned search engines, it printed all
the errors in (I assume) Japanese.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/        <csmall@eye-net.com.au>
MIEEE <csmall@ieee.org>                 Debian developer <csmall@debian.org>

Reply to:

References:
- Test search engine
  - From: csmall@eye-net.com.au (Craig Small)
- Re: Test search engine
  - From: knok@daionet.gr.jp (NOKUBI Takatsugu)

Prev by Date: Re: Crud on master
Next by Date: Re: Test search engine
Previous by thread: Re: Test search engine
Next by thread: Re: Test search engine
Index(es):
- Date
- Thread