[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#214107: locales: en_US.UTF-8 treats [ as a space char



At Sat, 1 Nov 2003 12:31:55 +0100,
Petter Reinholdtsen wrote:
> When I test the same using woody with locales version 2.2.5-11.5, it
> work as it should.
> 
>   minerva:~# grep en_US /etc/locale.gen
>   en_US UTF-8
>   minerva:~# locale-gen
>   Generating locales...
>     [...]
>     en_US.UTF-8... done
>     [...]
>   Generation complete.
>   minerva:~# echo '[' | LANG=en_US.UTF-8 egrep '^[^[:space:]]+$'
>   [
>   minerva:~# echo '[' | LANG=en_US egrep '^[^[:space:]]+$'
>   [
>   minerva:~# echo '[' | LANG=C egrep '^[^[:space:]]+$'
>   [
>   minerva:~#
> 
> Comparing the en_US locale, there are no changes.  But both include
> the i18n "locale", and this changed between the versions.  It was
> updated to use Unicode version 3.2 form version 3.0 I do not know the
> CTYPE part of the locale well enough to say if the changes are
> relevant or not.  I found nothing in the change modifying the
> behaviour of <U005B>, which I believe is the value for '['.
> 
> Could it be some regex error in glibc?

With 2.3.2.ds1-9,

	$ echo '[' | LANG=C egrep '[[:space:]]+$'
	$ echo '[' | LANG=C egrep '^[^[:space:]]+$'
	[

Why don't you use [[:space:]] instead of ^[^[, or am I missing?
However, I wonder why this result is come:

	$ echo '[' | LANG=en_US egrep '[[:space:]]+$'
	$ echo '[' | LANG=en_US egrep '^[^[:space:]]+$'
	[
	$ echo '[' | LANG=en_US.UTF-8 egrep '[[:space:]]+$'
	[
	$ echo '[' | LANG=en_US.UTF-8 egrep '^[^[:space:]]+$'

Regards,
-- gotom



Reply to: