Bug#214107: locales: en_US.UTF-8 treats [ as a space char

To: Petter Reinholdtsen <pere@hungry.com>, 214107@bugs.debian.org
Subject: Bug#214107: locales: en_US.UTF-8 treats [ as a space char
From: GOTO Masanori <gotom@debian.or.jp>
Date: Tue, 04 Nov 2003 00:52:09 +0900
Message-id: <[🔎] 80vfq18n6e.wl@oris.opensource.jp>
Reply-to: GOTO Masanori <gotom@debian.or.jp>, 214107@bugs.debian.org
In-reply-to: <[🔎] E1AFtz1-0001z0-00@saruman.uio.no>
References: <[🔎] E1AFtz1-0001z0-00@saruman.uio.no>

At Sat, 1 Nov 2003 12:31:55 +0100,
Petter Reinholdtsen wrote:
> When I test the same using woody with locales version 2.2.5-11.5, it
> work as it should.
> 
>   minerva:~# grep en_US /etc/locale.gen
>   en_US UTF-8
>   minerva:~# locale-gen
>   Generating locales...
>     [...]
>     en_US.UTF-8... done
>     [...]
>   Generation complete.
>   minerva:~# echo '[' | LANG=en_US.UTF-8 egrep '^[^[:space:]]+$'
>   [
>   minerva:~# echo '[' | LANG=en_US egrep '^[^[:space:]]+$'
>   [
>   minerva:~# echo '[' | LANG=C egrep '^[^[:space:]]+$'
>   [
>   minerva:~#
> 
> Comparing the en_US locale, there are no changes.  But both include
> the i18n "locale", and this changed between the versions.  It was
> updated to use Unicode version 3.2 form version 3.0 I do not know the
> CTYPE part of the locale well enough to say if the changes are
> relevant or not.  I found nothing in the change modifying the
> behaviour of <U005B>, which I believe is the value for '['.
> 
> Could it be some regex error in glibc?

With 2.3.2.ds1-9,

	$ echo '[' | LANG=C egrep '[[:space:]]+$'
	$ echo '[' | LANG=C egrep '^[^[:space:]]+$'
	[

Why don't you use [[:space:]] instead of ^[^[, or am I missing?
However, I wonder why this result is come:

	$ echo '[' | LANG=en_US egrep '[[:space:]]+$'
	$ echo '[' | LANG=en_US egrep '^[^[:space:]]+$'
	[
	$ echo '[' | LANG=en_US.UTF-8 egrep '[[:space:]]+$'
	[
	$ echo '[' | LANG=en_US.UTF-8 egrep '^[^[:space:]]+$'

Regards,
-- gotom

Reply to:

Follow-Ups:
- Bug#214107: locales: en_US.UTF-8 treats [ as a space char
  - From: Colin Watson <cjwatson@debian.org>

References:
- Bug#214107: locales: en_US.UTF-8 treats [ as a space char
  - From: Petter Reinholdtsen <pere@hungry.com>

Prev by Date: Bug#218930: marked as done (SEVERE bug with ctype (isxdigit and co) in libc6 ds1-8 and ds1-9 (Sid) on i386)
Next by Date: Bug#218131: 3 glibc LSB RC bugs filed
Previous by thread: Bug#214107: locales: en_US.UTF-8 treats [ as a space char
Next by thread: Bug#214107: locales: en_US.UTF-8 treats [ as a space char
Index(es):
- Date
- Thread