[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: enable searching East Asian words at search.debian.org



Hi,

From: barbier@linuxfr.org (Denis Barbier)
Subject: Re: enable searching East Asian words at search.debian.org
Date: Mon, 12 May 2003 13:45:08 +0200

> > For example, I can search an Russian word "Novosti" (of course in
> > Cyrillic)
> 
> The point is: how are Cyrillic words passed by the web browser to the
> search engine?
> Are they encoded in ISO-8859-5, KOI8-R or UTF-8 charsets?

UTF-8, i.e., the same encoding as the search page.  For example,
the previous example:

http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g=

The first 6 bytes read:

%D0%9D -> U+041D (CYRILLIC CAPITAL LETTER EN)
%D0%BE -> U+043E (CYRILLIC SMALL LETTER O)
%D0%B2 -> U+0432 (CYRILLIC SMALL LETTER VE)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/




Reply to: