[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Test search engine



In article <[🔎] 20000908164305.A6425@eye-net.com.au>
csmall@eye-net.com.au writes:

>> For the double-byte character guys there some bad news, it apparently
>> doesn't handle these yet.

I did not check umdsearch, however, it should need "word segmentation"
process for some languages (like Japanese).
There are no space between words in some languages. Therefore a
boundary of words is not clear in such languages.

kakasi and chasen can segment Japanese words. I don't know about other
languages...

If a multilingual word segmentation tool will available, i18n serach
engine would be made.
-- 
NOKUBI Takatsugu
E-mail: knok@daionet.gr.jp
	knok@debian.or.jp (Debian-JP)



Reply to: