[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1050611: /usr/bin/iconv: iconv use_from_charmap mixes up char_table and byte_table



Package: libc-bin
Version: 2.37-7
Severity: normal
File: /usr/bin/iconv
Tags: patch
X-Debbugs-Cc: bugs.debian.org@wongs.net

Dear Maintainer,

The iconv program, following POSIX, allows charmap files to be used
directly for conversion without having to be compiled into a gconv
module. For example,

    iconv -f ./palimpsest.charmap

This is a very handy feature as it allows end users to quickly make
custom mappings without needing to compile a gconv module.
Unfortunately, due to a simple bug (using the wrong hash table), iconv
scrambles the conversion when the char hash table is realloc'd.

Changing `char_table` to `byte_table` in iconv/iconv_charmap.c:339
will fix this. (Patch attached.)

An example file, palimpsest.char, that exercises this bug is also
attached.

Current version of iconv:

	$ echo 0123456789 | iconv -f ./palimpsest.charmap
	෦꩑꧒꧓४꘥꧖෭෮෯

Patched version of iconv:

	$ echo 0123456789 | iconv -f ./palimpsest.charmap
	0123456789



-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.4.0-2-amd64 (SMP w/8 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libc-bin depends on:
ii  libc6  2.37-7

Versions of packages libc-bin recommends:
ii  manpages  6.03-2

libc-bin suggests no packages.

-- no debconf information
--- iconv/iconv_charmap.c.orig	2023-01-31 19:27:45.000000000 -0800
+++ iconv/iconv_charmap.c	2023-08-26 06:46:31.704552956 -0700
@@ -336,7 +336,7 @@
 
   rettbl = allocate_table ();
 
-  while (iterate_table (&from_charmap->char_table, &ptr, &key, &keylen, &data)
+  while (iterate_table (&from_charmap->byte_table, &ptr, &key, &keylen, &data)
 	 >= 0)
     {
       struct charseq *in = data;
<code_set_name> ATAVISTIC-PALIMPSEST
<comment_char> %
<escape_char> /
% alias PALIMPSEST

% Test the iconv charmap file (bug present in glibc 2023-08-26).

% iconv uses two hash tables: char (to byte) mapping and byte (to char).
% The following charmap exercises both hash tables by forcing each of
% them to realloc memory, which occurs at 75% of their initial size (257).
% When the 193rd entry is added, a new hash table of twice the size is
% created and the old one copied in.

% Usage: 
% 	echo 0123456789 | iconv -f ./palimpsest.charmap

% Correct output:
%	0123456789

CHARMAP

% Force char_table to realloc
<U0000>..<U007F> 	/x00	Total: 128 UCS characters have been mapped.
<UAA50>..<UAA59> 	/x30	       138. Cham digits
<UA9F0>..<UA9F9> 	/x30	       148. Myanmar Tai Laing digits
<UA9D0>..<UA9D9> 	/x30	       158. Javanese digits
<UA620>..<UA629> 	/x30	       168. Vai digits
<U0F20>..<U0F29> 	/x30	       178. Tibetan digits
<U0DE6>..<U0DEF> 	/x30	       188. Sinhala Lith digits
<U0966>..<U096F> 	/x30	       198. Devanagri digits

% Force byte_table to realloc	Total: 128 Byte sequences have been mapped
<U07C0>..<U07C9> 	/d128	       138. Nko digits
<U09E6>..<U09EF>	/d138	       148. Bengali digits
<U0A66>..<U0A6F> 	/d148	       158. Gurmukhi digits
<U0AE6>..<U0AEF> 	/d158	       168. Gujarati digits
<U0B66>..<U0B6F> 	/d168	       178. Oriya digits
<UA900>..<UA909> 	/d178	       188. Kayah Li digits
<U104A0>..<U104A9>	/d188	       198. Osmanya digits

END CHARMAP



% Verbose explanation.

% Multiple UCS characters are allowed to map to one particular byte
% encoding, but when mapping *from* the characterset, only the first
% entry is supposed to be used to find the corresponding UCS character.
 
% Before the 193rd character is added, iconv correctly maps
% bytes from 0x30 to 0x39 as the digits 0 to 9:
%
% 	$ echo 0123456789 | iconv -f ./palimpsest.charmap
%	0123456789

% In the buggy version of iconv, after the 193rd character is added,
% the result is garbled:
%
% 	$ echo 0123456789 | iconv -f ./palimpsest.charmap
%	෦꩑꧒꧓४꘥꧖෭෮෯

% To trigger this error the same byte sequence has to be used more
% than once. As mentioned above, duplicate byte sequences are supposed
% to be hidden in the reverse direction. After the 193rd char entry,
% the buggy version of iconv acts as if some layers have been scraped
% off, revealing those underlying maps:

%   0x30	෦	U+0DE6  SINHALA LITH DIGIT ZERO
%   0x31	꩑	U+AA51  CHAM DIGIT ONE
%   0x32	꧒	U+A9D2  JAVANESE DIGIT TWO
%   0x33	꧓	U+A9D3  JAVANESE DIGIT THREE
%   0x34	४	U+096A  DEVANAGARI DIGIT FOUR
%   0x35	꘥	U+A625  VAI DIGIT FIVE
%   0x36	꧖	U+A9D6  JAVANESE DIGIT SIX
%   0x37	෭	U+0DED  SINHALA LITH DIGIT SEVEN
%   0x38	෮	U+0DEE  SINHALA LITH DIGIT EIGHT
%   0x39	෯	U+0DEF  SINHALA LITH DIGIT NINE


% Analysis: when the hashtable is 75% full, memory is reallocated.
% Initial hashtable size is 257 (first prime after 256) and 75% of
% that is 192.75. So, realloc is triggered on the 193rd character.

% This bug wasn't caused by the memory reallocation, only made
% visible. iconv seemed to work previously because the iteration order
% of the char_table hash just happened to match the insertion order
% from the file.

% The problem was triggered when the char_table, which maps from a UCS
% character to the byte sequence is resized, but the bug occurred in
% the reverse direction. That pointed to the solution:
% iconv_charmap.c:use_from_charmap() should call iterate_table() on
% byte_table, not char_table.


Reply to: