[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: enable searching East Asian words at search.debian.org



On Mon, May 12, 2003 at 10:26:30PM +0900, Tomohiro KUBOTA wrote:
> Hi,
> 
> From: barbier@linuxfr.org (Denis Barbier)
> Subject: Re: enable searching East Asian words at search.debian.org
> Date: Mon, 12 May 2003 13:45:08 +0200
> 
> > > For example, I can search an Russian word "Novosti" (of course in
> > > Cyrillic)
> > 
> > The point is: how are Cyrillic words passed by the web browser to the
> > search engine?
> > Are they encoded in ISO-8859-5, KOI8-R or UTF-8 charsets?
> 
> UTF-8, i.e., the same encoding as the search page.  For example,
> the previous example:
> 
> http://search.debian.org/?q=%D0%9D%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8&ps=10&o=0&m=all&g=
> 
> The first 6 bytes read:
> 
> %D0%9D -> U+041D (CYRILLIC CAPITAL LETTER EN)
> %D0%BE -> U+043E (CYRILLIC SMALL LETTER O)
> %D0%B2 -> U+0432 (CYRILLIC SMALL LETTER VE)

Hmmm I tend to disagree.  I tried with the French word for 'election',
which is also 'election' but first 'e' being e-acute.
In a ISO-8859-15 environment, I enter this word on search.debian.org and
select the French language, and am redirected to
  http://search.debian.org/?q=%C3%A9lection&ps=10&o=0&m=all&g=fr
which gives 46 pages.

If now I run
  $ export LANG=fr_FR.UTF-8
  $ xterm
go to search.debian.org in this window and cut'n'paste this word from
another window, I am redirected to
  http://search.debian.org/?q=%C3%83%C2%A9lection&ps=10&o=0&m=all&g=fr
which means that e-acute has been converted twice, and no pages are
found.  Am I doing something wrong?

Denis



Reply to: