[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Translators! What's your charset?



I've been chasing up the reason why udmsearch does not index
non-english too well and after having a chat with the developer it all
comes down to charsets.

English basically has a word charset of [A-Za-z0-9] easy stuff and
all 7 bit.  But other languages have other charsets.

Charsets I have already are:
Cryllic: cpl25l, koi8r, cp866, iso88595, maccyr
Western: iso-8859-1
Central Europe: iso-8859-2, cpl250
Arabic: cpl256

OK, so all charsets include the ASCII [A-Za-z0-9] but what I need to
know from the translators is their charset for their language.

I need upper case characters first, then lowercase.  If there is no
equivalent upper/lower then put it in twice.  If the language has no
concept of upper/lower at all then just include the set once and let
me know it doesn't have upper/lower.

The format is flexible:
characters in their wierd form:  
  "áâ÷çäå³öúéêëìíîïðòóôõæèãþûýÿùøüàñÁÂ×ÇÄÅ£ÖÚÉÊËÌÍÎÏÐÒÓÔÕÆÈÃÞÛÝßÙØÜÀÑ"
characters in their decimal equivalents:
  193,195,194,196,161,198,200,199,207,201,204,203,202,208,205,206,197
characters in their hex equivalents:
  0x8d, 0x8e, 0x90, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7
 
As long as I can do something like char mycharset[] = <your stuff>; in C
then I'm happy.

I will pass on these charsets to upstream to include in udmsearch
proper.  I'll try to make sure you get acknowledged (include some email
address you want in there).

Hope it isn't too much trouble, but it will mean that udmsearch will
index in your language very nicely.

For the dual-byte folks, I don't think this will work.  The upstream
author is willing to work with you, but he's not sure how to do it.
Actually it may work... if you put both bytes into the charset.
Depends on what your whitespace looks like.

  - Craig

-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/        <csmall@eye-net.com.au>
MIEEE <csmall@ieee.org>                 Debian developer <csmall@debian.org>



Reply to: