Bug#396528: hunspell on arm

To: Riku Voipio <riku.voipio@iki.fi>
Cc: 396528@bugs.debian.org
Subject: Bug#396528: hunspell on arm
From: "Steinar H. Gunderson" <sgunderson@bigfoot.com>
Date: Wed, 15 Nov 2006 01:07:06 +0100
Message-id: <[🔎] 20061115000706.GA1275@uio.no>
Reply-to: "Steinar H. Gunderson" <sgunderson@bigfoot.com>, 396528@bugs.debian.org
In-reply-to: <[🔎] 20061111204844.GA5877@kos.to>
References: <[🔎] 20061111204844.GA5877@kos.to>

On Sat, Nov 11, 2006 at 10:48:44PM +0200, Riku Voipio wrote:
> All have a utf-8 character in WORDCHARS. I didn't
> spot anything obvious in WORDCHARS parsing which could
> break only on arm.

Ah, that is indeed the key, or so it seems. hunspell converts UTF-8 to UCS-2
(it says "UTF-16", but it really seems to be UCS-2), using a function with
this prototype:

  int u8_u16(w_char * dest, int size, const char * src) {

It then goes on doing stuff like

  u2->h = (*u8 & 0x1f) >> 2;
  u2->l = (*u8 << 6) + (*(u8+1) & 0x3f);
  u8++;

Now, consider the fact that char is an unsigned type on arm, and signed on
most other platforms, and I guess we have the source of our bug. (Now, I have
no idea why why hunspell has its own definitions and functions instead of
using the existing wchar_t type and functions like mbstowcs, but I'm not
going to change that.)

I'll be sure to fix the locale bug too in the same upload, although I'm not
bothering to file it as a separate bug.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

Reply to:

References:
- Bug#396528: hunspell on arm
  - From: Riku Voipio <riku.voipio@iki.fi>

Prev by Date: Bug#396528: hunspell on arm
Next by Date: Bug#206212: That appear
Previous by thread: Bug#396528: hunspell on arm
Next by thread: myspell 1:3.0+pre3.1-17 MIGRATED to testing
Index(es):
- Date
- Thread