Bug#689997: iconv: illegal input sequence at position 86 ERROR: Conversion of /usr/share/hunspell/hu_HU_u8.aff failed
Package: myspell-hu
Version: 1.2+repack-2
Severity: normal
Control: affects -1 postgresql-common
Hi myspell-hu maintainer!
it seems that there is non-u8 data in /usr/share/hunspell/hu_HU_u8.aff
-- it causes an error message to be emitted in pg_updatedicts when it is
converted to the postgres form (see transcript below).
It appears to be due (at least) to ISO-8859-1 characters in the
comments in an otherwise-UTF-8 file:
0 dkg@stylus:~/tmp$ grep ^SET </usr/share/hunspell/hu_HU_u8.aff
SET UTF-8
0 dkg@stylus:~/tmp$ hd </usr/share/hunspell/hu_HU_u8.aff | head -n6
00000000 23 20 54 68 69 73 20 64 69 63 74 69 6f 6e 61 72 |# This dictionar|
00000010 79 20 69 73 20 62 61 73 65 64 20 6f 6e 20 74 68 |y is based on th|
00000020 65 20 48 75 6e 67 61 72 69 61 6e 20 77 6f 72 64 |e Hungarian word|
00000030 6c 69 73 74 20 61 6e 64 20 61 66 66 69 78 65 73 |list and affixes|
00000040 20 63 72 65 61 74 65 64 20 20 20 20 20 20 20 0a | created .|
00000050 23 20 62 79 20 4c e1 73 7a 6c f3 20 4e e9 6d 65 |# by L.szl. N.me|
0 dkg@stylus:~/tmp$
note the use of 0xe1 for á, 0xf3 for ó and 0xe9 for é, which is
ISO-8859-1.
this affects pg_updatedicts, because that proram calls iconv on the
file like this:
system 'iconv', '-f', $enc, '-t', 'UTF-8', '-o', "$cachedir/$locale.affix", $aff
where $enc comes from:
sub get_encoding {
open F, $_[0] or die "cannot open $_[0]: $!";
while (<F>) {
if (/^SET ([\w-]+)\s*$/) { return $1; }
}
return undef;
}
There are many non-UTF-8 characters in hu_HU_u8.aff as well, though.
a simple transformation of all the comments to UTF-8 doesn't appear to
be enough to make the pg_updatedicts invocation complete without a
noisy warning.
If you think the file is legitimate as it stands (i confess i don't
really understand the affix file format), please reassign this bug to
postgresql-common (which owns pg_updatedicts) so that
postgresql-common can do better file handling.
Arguably, this is also a bug in postgresql-common because it is not
verifying that the iconv transformation worked correctly at all, but
just pushes ahead.
Regards,
--dkg
0 root@stylus:~# pg_updatedicts
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
en_us
0 root@stylus:~# apt-get install myspell-hu
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
openoffice.org
The following NEW packages will be installed:
myspell-hu
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/1,105 kB of archives.
After this operation, 6,092 kB of additional disk space will be used.
Selecting previously unselected package myspell-hu.
(Reading database ... 271340 files and directories currently installed.)
Unpacking myspell-hu (from .../myspell-hu_1.2+repack-2_all.deb) ...
Setting up myspell-hu (1.2+repack-2) ...
0 root@stylus:~# pg_updatedicts
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
en_us
hu_hu
hu_hu_u8
iconv: illegal input sequence at position 86
ERROR: Conversion of /usr/share/hunspell/hu_HU_u8.aff failed
0 root@stylus:~# apt-get purge myspell-hu
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
myspell-hu*
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 6,092 kB disk space will be freed.
Do you want to continue [Y/n]?
(Reading database ... 271351 files and directories currently installed.)
Removing myspell-hu ...
0 root@stylus:~# pg_updatedicts
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
en_us
0 root@stylus:~#
-- System Information:
Debian Release: wheezy/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 3.5-trunk-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages myspell-hu depends on:
ii dictionaries-common 1.12.10
myspell-hu recommends no packages.
Versions of packages myspell-hu suggests:
pn openoffice.org <none>
-- no debconf information
Reply to: