[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#689997: iconv: illegal input sequence at position 86 ERROR: Conversion of /usr/share/hunspell/hu_HU_u8.aff failed



Package: myspell-hu
Version: 1.2+repack-2
Severity: normal
Control: affects -1 postgresql-common

Hi myspell-hu maintainer!

it seems that there is non-u8 data in /usr/share/hunspell/hu_HU_u8.aff
-- it causes an error message to be emitted in pg_updatedicts when it is
converted to the postgres form (see transcript below).

It appears to be due (at least) to ISO-8859-1 characters in the
comments in an otherwise-UTF-8 file:

0 dkg@stylus:~/tmp$ grep ^SET </usr/share/hunspell/hu_HU_u8.aff 
SET UTF-8     
0 dkg@stylus:~/tmp$ hd </usr/share/hunspell/hu_HU_u8.aff | head -n6
00000000  23 20 54 68 69 73 20 64  69 63 74 69 6f 6e 61 72  |# This dictionar|
00000010  79 20 69 73 20 62 61 73  65 64 20 6f 6e 20 74 68  |y is based on th|
00000020  65 20 48 75 6e 67 61 72  69 61 6e 20 77 6f 72 64  |e Hungarian word|
00000030  6c 69 73 74 20 61 6e 64  20 61 66 66 69 78 65 73  |list and affixes|
00000040  20 63 72 65 61 74 65 64  20 20 20 20 20 20 20 0a  | created       .|
00000050  23 20 62 79 20 4c e1 73  7a 6c f3 20 4e e9 6d 65  |# by L.szl. N.me|
0 dkg@stylus:~/tmp$ 

note the use of 0xe1 for á, 0xf3 for ó and 0xe9 for é, which is
ISO-8859-1.

this affects pg_updatedicts, because that proram calls iconv on the
file like this:

  system 'iconv', '-f', $enc, '-t', 'UTF-8', '-o', "$cachedir/$locale.affix", $aff

where $enc comes from:

sub get_encoding {
    open F, $_[0] or die "cannot open $_[0]: $!";
    while (<F>) {
        if (/^SET ([\w-]+)\s*$/) { return $1; }
    }
    return undef;
}


There are many non-UTF-8 characters in hu_HU_u8.aff as well, though.
a simple transformation of all the comments to UTF-8 doesn't appear to
be enough to make the pg_updatedicts invocation complete without a
noisy warning.

If you think the file is legitimate as it stands (i confess i don't
really understand the affix file format), please reassign this bug to
postgresql-common (which owns pg_updatedicts) so that
postgresql-common can do better file handling.

Arguably, this is also a bug in postgresql-common because it is not
verifying that the iconv transformation worked correctly at all, but
just pushes ahead.

Regards,

	--dkg

0 root@stylus:~# pg_updatedicts
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  en_us
0 root@stylus:~# apt-get install myspell-hu
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  openoffice.org
The following NEW packages will be installed:
  myspell-hu
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/1,105 kB of archives.
After this operation, 6,092 kB of additional disk space will be used.
Selecting previously unselected package myspell-hu.
(Reading database ... 271340 files and directories currently installed.)
Unpacking myspell-hu (from .../myspell-hu_1.2+repack-2_all.deb) ...
Setting up myspell-hu (1.2+repack-2) ...
0 root@stylus:~# pg_updatedicts
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  en_us
  hu_hu
  hu_hu_u8
iconv: illegal input sequence at position 86
ERROR: Conversion of /usr/share/hunspell/hu_HU_u8.aff failed
0 root@stylus:~# apt-get purge myspell-hu
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  myspell-hu*
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 6,092 kB disk space will be freed.
Do you want to continue [Y/n]? 
(Reading database ... 271351 files and directories currently installed.)
Removing myspell-hu ...
0 root@stylus:~# pg_updatedicts
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  en_us
0 root@stylus:~# 


-- System Information:
Debian Release: wheezy/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 3.5-trunk-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages myspell-hu depends on:
ii  dictionaries-common  1.12.10

myspell-hu recommends no packages.

Versions of packages myspell-hu suggests:
pn  openoffice.org  <none>

-- no debconf information


Reply to: