Re: enable searching East Asian words at search.debian.org
Hi,
From: barbier@linuxfr.org (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Mon, 12 May 2003 13:45:08 +0200
> > For example, I can search an Russian word "Novosti" (of course in
> > Cyrillic)
>
> The point is: how are Cyrillic words passed by the web browser to the
> search engine?
> Are they encoded in ISO-8859-5, KOI8-R or UTF-8 charsets?
UTF-8, i.e., the same encoding as the search page. For example,
the previous example:
http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g=
The first 6 bytes read:
%D0%9D -> U+041D (CYRILLIC CAPITAL LETTER EN)
%D0%BE -> U+043E (CYRILLIC SMALL LETTER O)
%D0%B2 -> U+0432 (CYRILLIC SMALL LETTER VE)
---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
Reply to: