[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: enable searching East Asian words at search.debian.org



Hi,

From: Josip Rodin <joy@srce.hr>
Subject: Re: enable searching East Asian words at search.debian.org
Date: Sun, 11 May 2003 14:33:38 +0200

> make: Entering directory `/org/www.debian.org/webwml/japanese/searchtmpl'
> wml -q -D CUR_YEAR=2003 -o UNDEFuJA:search.ja.html.tmp@g+w --prolog="/usr/bin/kcc -e -" --epilog="../convert search.ja.html" search.wml
> c=`grep CHARSET ../.wmlrc | cut -d= -f2`; \
>           iconv -f $c -t UTF-8 search.ja.html | perl -pe 's,^(\s*<meta http-equiv="Content-Type" content="text/html; charset=)\S+(">)$,$1UTF-8$2,' > search.ja.html
> iconv: cannot open input file `euc-jp': No such file or directory
> copying search.ja.html to ../../../www/searchtmpl
> make: Leaving directory `/org/www.debian.org/webwml/japanese/searchtmpl'

Sorry I don't understand what you are doing.  However, my "improvement"
is not related to search.ja.html (or translation of search page) at all.

My intension is to enable searching, for example, "Bunsho" (in Kanji),
which means "documentation" in Japanese, at the search page.  It should
be enabled, because there are many Japanese-translated pages at Debian
site and these pages should be targets of searching.  Not translation
of the search page.  (I guess you are trying to prepare Japanese
translation of search page?  I will research this point later.  However,
please note, for Japanese people, that a search page in English which
can search Japanese words is absolutely better than a search page in
Japanese which cannot search Japanese words.)

The problems are:

(1) Though mnogosearch is based on UTF-8 (and should be able to process
all languages for translation of Debian web pages), the support of CJK
languages are disabled.  (Please read the ./configure --help output or
installation instruction of mnogosearch).  The option is just to drop
character code mapping tables between CJK encodings and UTF-8.  This is
why recompilation of mnogosearch is needed.

(2) Japanese and Chinese don't use whitespaces between "words", which
causes indexing (i.e., reading all web pages and store all "words" into
databaase for searching) doesn't work well.  chasen-related packages
are needed to fix this.  (I hope you read my mails which I wrote that
chasen is needed -- please just go back this thread.)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: