[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Test search engine



On Tue, Sep 12, 2000 at 05:44:37PM +0900, NOKUBI Takatsugu wrote:
> I did not check umdsearch, however, it should need "word segmentation"
> process for some languages (like Japanese).
> There are no space between words in some languages. Therefore a
> boundary of words is not clear in such languages.
> 
> kakasi and chasen can segment Japanese words. I don't know about other
> languages...
The way it determines a word is that it has a character list that makes
up a word, something like [A-Za-z0-9]. A word is a sequence of
characters of: not-in-list, in-list, not-in-list

So is there is some byte sequence that equates to a space, then you make
sure it is not in the character list and udmsearch says that's where the
word ends.

I tried at least one of those mentioned search engines, it printed all
the errors in (I assume) Japanese.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/        <csmall@eye-net.com.au>
MIEEE <csmall@ieee.org>                 Debian developer <csmall@debian.org>



Reply to: