[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#396528: hunspell on arm



On Sat, Nov 11, 2006 at 10:48:44PM +0200, Riku Voipio wrote:
> All have a utf-8 character in WORDCHARS. I didn't
> spot anything obvious in WORDCHARS parsing which could
> break only on arm.

Ah, that is indeed the key, or so it seems. hunspell converts UTF-8 to UCS-2
(it says "UTF-16", but it really seems to be UCS-2), using a function with
this prototype:

  int u8_u16(w_char * dest, int size, const char * src) {

It then goes on doing stuff like

  u2->h = (*u8 & 0x1f) >> 2;
  u2->l = (*u8 << 6) + (*(u8+1) & 0x3f);
  u8++;

Now, consider the fact that char is an unsigned type on arm, and signed on
most other platforms, and I guess we have the source of our bug. (Now, I have
no idea why why hunspell has its own definitions and functions instead of
using the existing wchar_t type and functions like mbstowcs, but I'm not
going to change that.)

I'll be sure to fix the locale bug too in the same upload, although I'm not
bothering to file it as a separate bug.

/* Steinar */
-- 
Homepage: http://www.sesse.net/



Reply to: