[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#362514: locales: et_EE causes "sed -e expression #1 char 19: Invalid range end"



Package: locales
Version: 2.3.6-6
Severity: normal
Tags: l10n

*** Please type your report below this line ***

The following expression is contained in /sbin/MAKEDEV:

 sed -e 's/[^A-Za-z0-9_]/_/g'

It is called by several postinstall scripts. 

It fails in calling the function re_compile_pattern  regex.c in the sed package
with the following error message:

sed: -e avaldis #1, sümbol 19: Invalid range end

(equivalent to: sed -e expression #1 char 19: Invalid range end)

sed-4.1.4:lib/regex.c
87  re_set_syntax (syntax);
88>  error = re_compile_pattern (new_regex->re, new_regex->sz,
89                              &new_regex->pattern);
  
An apparently similar case was described in bug #343080, but this cannot be
fixed with replacing ' by " (as described in bug #342868. Tried
it.). Similarily, it is not dependent on libc-i686.

I am filing the bug against locales as I could only reproduce the error under
the following locales (out of 381 locales):


et_EE
sed: -e avaldis #1, sümbol 19: Invalid range end
ISO-8859-1

et_EE.ISO-8859-15
sed: -e avaldis #1, sümbol 19: Invalid range end
ISO-8859-15

et_EE.UTF-8
sed: -e avaldis #1, sÃŒmbol 19: Invalid range end

vi_VN.TCVN
sed: -e expression #1, char 22: unterminated `s' command

TCVN5712-1
sed: -e expression #1, char 22: unterminated `s' command




I could _not_ reproduce the bug with the following program:

 
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <regex.h>
#include <stdlib.h>

int
main (void)
{
  struct re_pattern_buffer regex;
  const char *s;
  int match;
  int result = 0;
  int syntax = 17105606;
  

  memset (&regex, '\0', sizeof (regex));
  re_set_syntax (syntax);
  regex.fastmap = malloc(1 << (sizeof (char) * 8));
  s = re_compile_pattern ("[^A-Za-z0-9_]", 13, &regex);

  if (s != NULL)
    {
      puts ("re_compile_pattern returns a non-NULL value: ");
      puts (s);
      result = 1;
    }
  else
    {
	puts (" -> OK");
    }

  return result;
}


Memory management in sed is not quite trivial and thus without investing more
time I am not sure what went wrong. May be someone who knows locales can fix it
in no time?

The version of sed is 4.1.4-7.

Te successful backtrace of sed (LC_ALL=C) is the following:

(gdb) bt full
#0  compile_regex_1 (new_regex=0x805a498, needed_sub=1) at regexp.c:90
	error = 0x0
	syntax = 17105606
#1  0x0804f3ae in compile_regex (b=0x805a3f8, flags=0, needed_sub=1) at
regexp.c:150
	new_regex = (struct regex *) 0x805a498
	re_len = 13
#2  0x0804b84a in compile_program (vector=0x8059030) at compile.c:1263
	b2 = (struct buffer *) 0x805a440
	flags = 0
	slash = 47
	a = {addr_type = ADDR_IS_NULL, addr_number = 4294967295, addr_step = 0,
addr_regex = 0x0}
	cur_cmd = (struct sed_cmd *) 0x805a030
	b = (struct buffer *) 0x805a3f8
	ch = 115
#3  0x0804c1f3 in compile_string (cur_program=0x0, str=0xbfc3b8a6
"s/[^A-Za-z0-9_]/_/g", len=19) at compile.c:1567
	string_expr_count = 1
	ret = (struct vector *) 0xb7ef4ff4
#4  0x080496a5 in main (argc=3, argv=0xbfc39804) at sed.c:212
	longopts = {{name = 0x805123d "regexp-extended", has_arg = 0, flag =
0x0, val = 114}, {name = 0x805124d "expression", has_arg = 1, flag = 0x0, val =
101}, {name = 0x8051258 "file", has_arg = 1, flag = 0x0, val = 102}, {name =
0x805125d "in-place", has_arg = 2, flag = 0x0, val = 105}, {name = 0x8051266
"line-length", has_arg = 1, flag = 0x0, val = 108}, {name = 0x8051272 "quiet",
has_arg = 0, flag = 0x0, val = 110}, {name = 0x8051278 "posix", has_arg = 0,
flag = 0x0, val = 112}, {name = 0x805127e "silent", has_arg = 0, flag = 0x0, val
= 110}, {name = 0x8051285 "separate", has_arg = 0, flag = 0x0, val = 115}, {name
= 0x805128e "unbuffered", has_arg = 0, flag = 0x0, val = 117}, {name = 0x8051299
"version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x80512a1 "help",
has_arg = 0, flag = 0x0, val = 104}, {name = 0x0, has_arg = 0, flag = 0x0, val =
0}}
	opt = 101
	return_code = 134548320
	cols = 0x0
(gdb) 




The failed backtrace (LC_ALL=et_EE):

(gdb) bt full
#0  compile_regex_1 (new_regex=0x805acb8, needed_sub=1) at regexp.c:90
>!!!	error = 0xb7f600d8 "Invalid range end"
	syntax = 17105606
#1  0x0804f3ae in compile_regex (b=0x805ac18, flags=0, needed_sub=1) at
regexp.c:150
	new_regex = (struct regex *) 0x805acb8
	re_len = 13
#2  0x0804b84a in compile_program (vector=0x8059850) at compile.c:1263
	b2 = (struct buffer *) 0x805ac60
	flags = 0
	slash = 47
	a = {addr_type = ADDR_IS_NULL, addr_number = 4294967295, addr_step = 0,
addr_regex = 0x0}
	cur_cmd = (struct sed_cmd *) 0x805a850
	b = (struct buffer *) 0x805ac18
	ch = 115
#3  0x0804c1f3 in compile_string (cur_program=0x0, str=0xbf8b78a2
"s/[^A-Za-z0-9_]/_/g", len=19) at compile.c:1567
	string_expr_count = 1
	ret = (struct vector *) 0xb7f70ff4
#4  0x080496a5 in main (argc=3, argv=0xbf8b5b14) at sed.c:212
	longopts = {{name = 0x805123d "regexp-extended", has_arg = 0, flag =
0x0, val = 114}, {name = 0x805124d "expression", has_arg = 1, flag = 0x0, val =
101}, {name = 0x8051258 "file", has_arg = 1, flag = 0x0, val = 102}, {name =
0x805125d "in-place", has_arg = 2, flag = 0x0, val = 105}, {name = 0x8051266
"line-length", has_arg = 1, flag = 0x0, val = 108}, {name = 0x8051272 "quiet",
has_arg = 0, flag = 0x0, val = 110}, {name = 0x8051278 "posix", has_arg = 0,
flag = 0x0, val = 112}, {name = 0x805127e "silent", has_arg = 0, flag = 0x0, val
= 110}, {name = 0x8051285 "separate", has_arg = 0, flag = 0x0, val = 115}, {name
= 0x805128e "unbuffered", has_arg = 0, flag = 0x0, val = 117}, {name = 0x8051299
"version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x80512a1 "help",
has_arg = 0, flag = 0x0, val = 104}, {name = 0x0, has_arg = 0, flag = 0x0, val =
0}}
	opt = 101
	return_code = 134548320
	cols = 0x0
(gdb) 





-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.15
Locale: LANG=C, LC_CTYPE=C (charmap=ISO-8859-1) (ignored: LC_ALL set to et_EE)

Versions of packages locales depends on:
ii  debconf [debconf-2.0]         1.4.72     Debian configuration management sy
ii  libc6 [glibc-2.3.6-2]         2.3.6-6    GNU C Library: Shared libraries

locales recommends no packages.

-- debconf information:
* locales/default_environment_locale: et_EE
* locales/locales_to_be_generated: All locales, All locales, aa_DJ ISO-8859-1,
aa_DJ.UTF-8 UTF-8, aa_ER UTF-8, aa_ER@saaho UTF-8, aa_ET UTF-8, af_ZA
ISO-8859-1, af_ZA.UTF-8 UTF-8, am_ET UTF-8, an_ES ISO-8859-15, an_ES.UTF-8
UTF-8, ar_AE ISO-8859-6, ar_AE.UTF-8 UTF-8, ar_BH ISO-8859-6, ar_BH.UTF-8 UTF-8,
ar_DZ ISO-8859-6, ar_DZ.UTF-8 UTF-8, ar_EG ISO-8859-6, ar_EG.UTF-8 UTF-8, ar_IN
UTF-8, ar_IQ ISO-8859-6, ar_IQ.UTF-8 UTF-8, ar_JO ISO-8859-6, ar_JO.UTF-8 UTF-8,
ar_KW ISO-8859-6, ar_KW.UTF-8 UTF-8, ar_LB ISO-8859-6, ar_LB.UTF-8 UTF-8, ar_LY
ISO-8859-6, ar_LY.UTF-8 UTF-8, ar_MA ISO-8859-6, ar_MA.UTF-8 UTF-8, ar_OM
ISO-8859-6, ar_OM.UTF-8 UTF-8, ar_QA ISO-8859-6, ar_QA.UTF-8 UTF-8, ar_SA
ISO-8859-6, ar_SA.UTF-8 UTF-8, ar_SD ISO-8859-6, ar_SD.UTF-8 UTF-8, ar_SY
ISO-8859-6, ar_SY.UTF-8 UTF-8, ar_TN ISO-8859-6, ar_TN.UTF-8 UTF-8, ar_YE
ISO-8859-6, ar_YE.UTF-8 UTF-8, az_AZ.UTF-8 UTF-8, be_BY CP1251, be_BY.UTF-8
UTF-8, be_BY@latin UTF-8, bg_BG CP1251, bg_BG.UTF-8 UTF-8, bn_BD UTF-8, bn_IN
UTF-8, br_FR ISO-8859-1, br_FR.UTF-8 UTF-8, br_FR@euro ISO-8859-15, bs_BA
ISO-8859-2, bs_BA.UTF-8 UTF-8, byn_ER UTF-8, ca_AD ISO-8859-15, ca_AD.UTF-8
UTF-8, ca_ES ISO-8859-1, ca_ES.UTF-8 UTF-8, ca_ES@euro ISO-8859-15, ca_FR
ISO-8859-15, ca_FR.UTF-8 UTF-8, ca_IT ISO-8859-15, ca_IT.UTF-8 UTF-8, cs_CZ
ISO-8859-2, cs_CZ.UTF-8 UTF-8, csb_PL UTF-8, cy_GB ISO-8859-14, cy_GB.UTF-8
UTF-8, da_DK ISO-8859-1, da_DK.ISO-8859-15 ISO-8859-15, da_DK.UTF-8 UTF-8, de_AT
ISO-8859-1, de_AT.UTF-8 UTF-8, de_AT@euro ISO-8859-15, de_BE ISO-8859-1,
de_BE.UTF-8 UTF-8, de_BE@euro ISO-8859-15, de_CH ISO-8859-1, de_CH.UTF-8 UTF-8,
de_DE ISO-8859-1, de_DE.UTF-8 UTF-8, de_DE@euro ISO-8859-15, de_LU ISO-8859-1,
de_LU.UTF-8 UTF-8, de_LU@euro ISO-8859-15, dz_BT UTF-8, el_GR ISO-8859-7,
el_GR.UTF-8 UTF-8, en_AU ISO-8859-1, en_AU.UTF-8 UTF-8, en_BW ISO-8859-1,
en_BW.UTF-8 UTF-8, en_CA ISO-8859-1, en_CA.UTF-8 UTF-8, en_DK ISO-8859-1,
en_DK.ISO-8859-15 ISO-8859-15, en_DK.UTF-8 UTF-8, en_GB ISO-8859-1,
en_GB.ISO-8859-15 ISO-8859-15, en_GB.UTF-8 UTF-8, en_HK ISO-8859-1, en_HK.UTF-8
UTF-8, en_IE ISO-8859-1, en_IE.UTF-8 UTF-8, en_IE@euro ISO-8859-15, en_IN UTF-8,
en_NZ ISO-8859-1, en_NZ.UTF-8 UTF-8, en_PH ISO-8859-1, en_PH.UTF-8 UTF-8, en_SG
ISO-8859-1, en_SG.UTF-8 UTF-8, en_US ISO-8859-1, en_US.ISO-8859-15 ISO-8859-15,
en_US.UTF-8 UTF-8, en_ZA ISO-8859-1, en_ZA.UTF-8 UTF-8, en_ZW ISO-8859-1,
en_ZW.UTF-8 UTF-8, eo ISO-8859-3, eo.UTF-8 UTF-8, es_AR ISO-8859-1, es_AR.UTF-8
UTF-8, es_BO ISO-8859-1, es_BO.UTF-8 UTF-8, es_CL ISO-8859-1, es_CL.UTF-8 UTF-8,
es_CO ISO-8859-1, es_CO.UTF-8 UTF-8, es_CR ISO-8859-1, es_CR.UTF-8 UTF-8, es_DO
ISO-8859-1, es_DO.UTF-8 UTF-8, es_EC ISO-8859-1, es_EC.UTF-8 UTF-8, es_ES
ISO-8859-1, es_ES.UTF-8 UTF-8, es_ES@euro ISO-8859-15, es_GT ISO-8859-1,
es_GT.UTF-8 UTF-8, es_HN ISO-8859-1, es_HN.UTF-8 UTF-8, es_MX ISO-8859-1,
es_MX.UTF-8 UTF-8, es_NI ISO-8859-1, es_NI.UTF-8 UTF-8, es_PA ISO-8859-1,
es_PA.UTF-8 UTF-8, es_PE ISO-8859-1, es_PE.UTF-8 UTF-8, es_PR ISO-8859-1,
es_PR.UTF-8 UTF-8, es_PY ISO-8859-1, es_PY.UTF-8 UTF-8, es_SV ISO-8859-1,
es_SV.UTF-8 UTF-8, es_US ISO-8859-1, es_US.UTF-8 UTF-8, es_UY ISO-8859-1,
es_UY.UTF-8 UTF-8, es_VE ISO-8859-1, es_VE.UTF-8 UTF-8, et_EE ISO-8859-1,
et_EE.ISO-8859-15 ISO-8859-15, et_EE.UTF-8 UTF-8, eu_ES ISO-8859-1, eu_ES.UTF-8
UTF-8, eu_ES@euro ISO-8859-15, eu_FR ISO-8859-1, eu_FR.UTF-8 UTF-8, eu_FR@euro
ISO-8859-15, fa_IR UTF-8, fi_FI ISO-8859-1, fi_FI.UTF-8 UTF-8, fi_FI@euro
ISO-8859-15, fo_FO ISO-8859-1, fo_FO.UTF-8 UTF-8, fr_BE ISO-8859-1, fr_BE.UTF-8
UTF-8, fr_BE@euro ISO-8859-15, fr_CA ISO-8859-1, fr_CA.UTF-8 UTF-8, fr_CH
ISO-8859-1, fr_CH.UTF-8 UTF-8, fr_FR ISO-8859-1, fr_FR.UTF-8 UTF-8, fr_FR@euro
ISO-8859-15, fr_LU ISO-8859-1, fr_LU.UTF-8 UTF-8, fr_LU@euro ISO-8859-15, ga_IE
ISO-8859-1, ga_IE.UTF-8 UTF-8, ga_IE@euro ISO-8859-15, gd_GB ISO-8859-15,
gd_GB.UTF-8 UTF-8, gez_ER UTF-8, gez_ER@abegede UTF-8, gez_ET UTF-8,
gez_ET@abegede UTF-8, gl_ES ISO-8859-1, gl_ES.UTF-8 UTF-8, gl_ES@euro
ISO-8859-15, gu_IN UTF-8, gv_GB ISO-8859-1, gv_GB.UTF-8 UTF-8, he_IL ISO-8859-8,
he_IL.UTF-8 UTF-8, hi_IN UTF-8, hr_HR ISO-8859-2, hr_HR.UTF-8 UTF-8, hsb_DE
ISO-8859-2, hsb_DE.UTF-8 UTF-8, hu_HU ISO-8859-2, hu_HU.UTF-8 UTF-8, hy_AM.UTF-8
UTF-8, ia UTF-8, id_ID ISO-8859-1, id_ID.UTF-8 UTF-8, is_IS ISO-8859-1,
is_IS.UTF-8 UTF-8, it_CH ISO-8859-1, it_CH.UTF-8 UTF-8, it_IT ISO-8859-1,
it_IT.UTF-8 UTF-8, it_IT@euro ISO-8859-15, iw_IL ISO-8859-8, iw_IL.UTF-8 UTF-8,
ja_JP.EUC-JP EUC-JP, ja_JP.UTF-8 UTF-8, ka_GE GEORGIAN-PS, ka_GE.UTF-8 UTF-8,
kk_KZ PT154, kk_KZ.UTF-8 UTF-8, kl_GL ISO-8859-1, kl_GL.UTF-8 UTF-8, km_KH
UTF-8, kn_IN UTF-8, ko_KR.EUC-KR EUC-KR, ko_KR.UTF-8 UTF-8, ku_TR ISO-8859-9,
ku_TR.UTF-8 UTF-8, kw_GB ISO-8859-1, kw_GB.UTF-8 UTF-8, ky_KG UTF-8, lg_UG
ISO-8859-10, lg_UG.UTF-8 UTF-8, lo_LA UTF-8, lt_LT ISO-8859-13, lt_LT.UTF-8
UTF-8, lv_LV ISO-8859-13, lv_LV.UTF-8 UTF-8, mg_MG ISO-8859-15, mg_MG.UTF-8
UTF-8, mi_NZ ISO-8859-13, mi_NZ.UTF-8 UTF-8, mk_MK ISO-8859-5, mk_MK.UTF-8
UTF-8, ml_IN UTF-8, mn_MN UTF-8, mr_IN UTF-8, ms_MY ISO-8859-1, ms_MY.UTF-8
UTF-8, mt_MT ISO-8859-3, mt_MT.UTF-8 UTF-8, nb_NO ISO-8859-1, nb_NO.UTF-8 UTF-8,
ne_NP UTF-8, nl_BE ISO-8859-1, nl_BE.UTF-8 UTF-8, nl_BE@euro ISO-8859-15, nl_NL
ISO-8859-1, nl_NL.UTF-8 UTF-8, nl_NL@euro ISO-8859-15, nn_NO ISO-8859-1,
nn_NO.UTF-8 UTF-8, no_NO ISO-8859-1, no_NO.UTF-8 UTF-8, nr_ZA UTF-8, nso_ZA
UTF-8, oc_FR ISO-8859-1, oc_FR.UTF-8 UTF-8, om_ET UTF-8, om_KE ISO-8859-1,
om_KE.UTF-8 UTF-8, pa_IN UTF-8, pl_PL ISO-8859-2, pl_PL.UTF-8 UTF-8, pt_BR
ISO-8859-1, pt_BR.UTF-8 UTF-8, pt_PT ISO-8859-1, pt_PT.UTF-8 UTF-8, pt_PT@euro
ISO-8859-15, ro_RO ISO-8859-2, ro_RO.UTF-8 UTF-8, ru_RU ISO-8859-5, ru_RU.CP1251
CP1251, ru_RU.KOI8-R KOI8-R, ru_RU.UTF-8 UTF-8, ru_UA KOI8-U, ru_UA.UTF-8 UTF-8,
rw_RW UTF-8, sa_IN UTF-8, se_NO UTF-8, si_LK UTF-8, sid_ET UTF-8, sk_SK
ISO-8859-2, sk_SK.UTF-8 UTF-8, sl_SI ISO-8859-2, sl_SI.UTF-8 UTF-8, so_DJ
ISO-8859-1, so_DJ.UTF-8 UTF-8, so_ET UTF-8, so_KE ISO-8859-1, so_KE.UTF-8 UTF-8,
so_SO ISO-8859-1, so_SO.UTF-8 UTF-8, sq_AL ISO-8859-1, sq_AL.UTF-8 UTF-8, sr_CS
ISO-8859-5, sr_CS.UTF-8 UTF-8, ss_ZA UTF-8, st_ZA ISO-8859-1, st_ZA.UTF-8 UTF-8,
sv_FI ISO-8859-1, sv_FI.UTF-8 UTF-8, sv_FI@euro ISO-8859-15, sv_SE ISO-8859-1,
sv_SE.ISO-8859-15 ISO-8859-15, sv_SE.UTF-8 UTF-8, ta_IN UTF-8, te_IN UTF-8,
tg_TJ KOI8-T, tg_TJ.UTF-8 UTF-8, th_TH TIS-620, th_TH.UTF-8 UTF-8, ti_ER UTF-8,
ti_ET UTF-8, tig_ER UTF-8, tl_PH ISO-8859-1, tl_PH.UTF-8 UTF-8, tn_ZA UTF-8,
tr_TR ISO-8859-9, tr_TR.UTF-8 UTF-8, ts_ZA UTF-8, tt_RU.UTF-8 UTF-8, uk_UA
KOI8-U, uk_UA.UTF-8 UTF-8, ur_PK UTF-8, uz_UZ ISO-8859-1, uz_UZ.UTF-8 UTF-8,
uz_UZ@cyrillic UTF-8, ve_ZA UTF-8, vi_VN UTF-8, vi_VN.TCVN TCVN5712-1, wa_BE
ISO-8859-1, wa_BE.UTF-8 UTF-8, wa_BE@euro ISO-8859-15, wo_SN UTF-8, xh_ZA
ISO-8859-1, xh_ZA.UTF-8 UTF-8, yi_US CP1255, yi_US.UTF-8 UTF-8, zh_CN GB2312,
zh_CN.GB18030 GB18030, zh_CN.GBK GBK, zh_CN.UTF-8 UTF-8, zh_HK BIG5-HKSCS,
zh_HK.UTF-8 UTF-8, zh_SG GB2312, zh_SG.GBK GBK, zh_SG.UTF-8 UTF-8, zh_TW BIG5,
zh_TW.EUC-TW EUC-TW, zh_TW.UTF-8 UTF-8, zu_ZA ISO-8859-1, zu_ZA.UTF-8 UTF-8




Reply to: