[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#512525: marked as done (regexp: missing support for non-localized but utf8 environment)



Your message dated Sun, 11 Oct 2020 01:56:57 +0200
with message-id <20201010235657.z7kjagdr6yhq5bal@function>
and subject line Re: Bug#512525: regexp: missing support for non-localized but utf8 environment
has caused the Debian Bug report #512525,
regarding regexp: missing support for non-localized but utf8 environment
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
512525: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512525
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: libc6
Version: 2.7-18
Severity: normal

Hello,

My goal is to grep for intervals of unicode characters in utf-8 files.
However, character intervals depend on locales, so I have to set
LC_COLLATE to C, but doing so makes grep not know that my files are
utf-8, so I set LC_CTYPE to a UTF-8 locale, however that fails:

$ LANG=C LC_CTYPE=fr_FR.UTF-8 grep '[é-ë]' test.txt
grep: Invalid collation character

which comes from libc' re_compile_pattern() function.

Samuel

-- System Information:
Debian Release: 5.0
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.28 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                      1:4.3.2-1.1 GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  glibc-doc                     2.7-18     GNU C Library: Documentation
ii  locales                       2.7-18     GNU C Library: National Language (

-- debconf information excluded

-- 
Samuel
We are Pentium of Borg. Division is futile. You will be approximated.
(seen in someone's .signature)



--- End Message ---
--- Begin Message ---
Hello,

John Scott, le sam. 10 oct. 2020 09:29:09 -0400, a ecrit:
> On Wednesday, January 21, 2009 8:27:19 AM EDT Samuel Thibault wrote:
> > My goal is to grep for intervals of unicode characters in utf-8 files.
> > However, character intervals depend on locales, so I have to set
> > LC_COLLATE to C, but doing so makes grep not know that my files are
> > utf-8, so I set LC_CTYPE to a UTF-8 locale, however that fails:
> > 
> > $ LANG=C LC_CTYPE=fr_FR.UTF-8 grep '[é-ë]' test.txt
> > grep: Invalid collation character
> > 
> > which comes from libc' re_compile_pattern() function.
> For this you could try the C.UTF-8 locale.

Ah, that appeared in the meantime indeed :)

Samuel

--- End Message ---

Reply to: