[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#512525: regexp: missing support for non-localized but utf8 environment



Package: libc6
Version: 2.7-18
Severity: normal

Hello,

My goal is to grep for intervals of unicode characters in utf-8 files.
However, character intervals depend on locales, so I have to set
LC_COLLATE to C, but doing so makes grep not know that my files are
utf-8, so I set LC_CTYPE to a UTF-8 locale, however that fails:

$ LANG=C LC_CTYPE=fr_FR.UTF-8 grep '[é-ë]' test.txt
grep: Invalid collation character

which comes from libc' re_compile_pattern() function.

Samuel

-- System Information:
Debian Release: 5.0
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.28 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                      1:4.3.2-1.1 GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  glibc-doc                     2.7-18     GNU C Library: Documentation
ii  locales                       2.7-18     GNU C Library: National Language (

-- debconf information excluded

-- 
Samuel
We are Pentium of Borg. Division is futile. You will be approximated.
(seen in someone's .signature)



Reply to: