Bug#1050611: /usr/bin/iconv: iconv use_from_charmap mixes up char_table and byte_table
Package: libc-bin
Version: 2.37-7
Severity: normal
File: /usr/bin/iconv
Tags: patch
X-Debbugs-Cc: bugs.debian.org@wongs.net
Dear Maintainer,
The iconv program, following POSIX, allows charmap files to be used
directly for conversion without having to be compiled into a gconv
module. For example,
iconv -f ./palimpsest.charmap
This is a very handy feature as it allows end users to quickly make
custom mappings without needing to compile a gconv module.
Unfortunately, due to a simple bug (using the wrong hash table), iconv
scrambles the conversion when the char hash table is realloc'd.
Changing `char_table` to `byte_table` in iconv/iconv_charmap.c:339
will fix this. (Patch attached.)
An example file, palimpsest.char, that exercises this bug is also
attached.
Current version of iconv:
$ echo 0123456789 | iconv -f ./palimpsest.charmap
෦꩑꧒꧓४꘥꧖෭෮෯
Patched version of iconv:
$ echo 0123456789 | iconv -f ./palimpsest.charmap
0123456789
-- System Information:
Debian Release: trixie/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 6.4.0-2-amd64 (SMP w/8 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libc-bin depends on:
ii libc6 2.37-7
Versions of packages libc-bin recommends:
ii manpages 6.03-2
libc-bin suggests no packages.
-- no debconf information
--- iconv/iconv_charmap.c.orig 2023-01-31 19:27:45.000000000 -0800
+++ iconv/iconv_charmap.c 2023-08-26 06:46:31.704552956 -0700
@@ -336,7 +336,7 @@
rettbl = allocate_table ();
- while (iterate_table (&from_charmap->char_table, &ptr, &key, &keylen, &data)
+ while (iterate_table (&from_charmap->byte_table, &ptr, &key, &keylen, &data)
>= 0)
{
struct charseq *in = data;
<code_set_name> ATAVISTIC-PALIMPSEST
<comment_char> %
<escape_char> /
% alias PALIMPSEST
% Test the iconv charmap file (bug present in glibc 2023-08-26).
% iconv uses two hash tables: char (to byte) mapping and byte (to char).
% The following charmap exercises both hash tables by forcing each of
% them to realloc memory, which occurs at 75% of their initial size (257).
% When the 193rd entry is added, a new hash table of twice the size is
% created and the old one copied in.
% Usage:
% echo 0123456789 | iconv -f ./palimpsest.charmap
% Correct output:
% 0123456789
CHARMAP
% Force char_table to realloc
<U0000>..<U007F> /x00 Total: 128 UCS characters have been mapped.
<UAA50>..<UAA59> /x30 138. Cham digits
<UA9F0>..<UA9F9> /x30 148. Myanmar Tai Laing digits
<UA9D0>..<UA9D9> /x30 158. Javanese digits
<UA620>..<UA629> /x30 168. Vai digits
<U0F20>..<U0F29> /x30 178. Tibetan digits
<U0DE6>..<U0DEF> /x30 188. Sinhala Lith digits
<U0966>..<U096F> /x30 198. Devanagri digits
% Force byte_table to realloc Total: 128 Byte sequences have been mapped
<U07C0>..<U07C9> /d128 138. Nko digits
<U09E6>..<U09EF> /d138 148. Bengali digits
<U0A66>..<U0A6F> /d148 158. Gurmukhi digits
<U0AE6>..<U0AEF> /d158 168. Gujarati digits
<U0B66>..<U0B6F> /d168 178. Oriya digits
<UA900>..<UA909> /d178 188. Kayah Li digits
<U104A0>..<U104A9> /d188 198. Osmanya digits
END CHARMAP
% Verbose explanation.
% Multiple UCS characters are allowed to map to one particular byte
% encoding, but when mapping *from* the characterset, only the first
% entry is supposed to be used to find the corresponding UCS character.
% Before the 193rd character is added, iconv correctly maps
% bytes from 0x30 to 0x39 as the digits 0 to 9:
%
% $ echo 0123456789 | iconv -f ./palimpsest.charmap
% 0123456789
% In the buggy version of iconv, after the 193rd character is added,
% the result is garbled:
%
% $ echo 0123456789 | iconv -f ./palimpsest.charmap
% ෦꩑꧒꧓४꘥꧖෭෮෯
% To trigger this error the same byte sequence has to be used more
% than once. As mentioned above, duplicate byte sequences are supposed
% to be hidden in the reverse direction. After the 193rd char entry,
% the buggy version of iconv acts as if some layers have been scraped
% off, revealing those underlying maps:
% 0x30 ෦ U+0DE6 SINHALA LITH DIGIT ZERO
% 0x31 ꩑ U+AA51 CHAM DIGIT ONE
% 0x32 ꧒ U+A9D2 JAVANESE DIGIT TWO
% 0x33 ꧓ U+A9D3 JAVANESE DIGIT THREE
% 0x34 ४ U+096A DEVANAGARI DIGIT FOUR
% 0x35 ꘥ U+A625 VAI DIGIT FIVE
% 0x36 ꧖ U+A9D6 JAVANESE DIGIT SIX
% 0x37 ෭ U+0DED SINHALA LITH DIGIT SEVEN
% 0x38 ෮ U+0DEE SINHALA LITH DIGIT EIGHT
% 0x39 ෯ U+0DEF SINHALA LITH DIGIT NINE
% Analysis: when the hashtable is 75% full, memory is reallocated.
% Initial hashtable size is 257 (first prime after 256) and 75% of
% that is 192.75. So, realloc is triggered on the 193rd character.
% This bug wasn't caused by the memory reallocation, only made
% visible. iconv seemed to work previously because the iteration order
% of the char_table hash just happened to match the insertion order
% from the file.
% The problem was triggered when the char_table, which maps from a UCS
% character to the byte sequence is resized, but the bug occurred in
% the reverse direction. That pointed to the solution:
% iconv_charmap.c:use_from_charmap() should call iterate_table() on
% byte_table, not char_table.
Reply to: