Bug#214107: locales: en_US.UTF-8 treats [ as a space char
At Sat, 1 Nov 2003 12:31:55 +0100,
Petter Reinholdtsen wrote:
> When I test the same using woody with locales version 2.2.5-11.5, it
> work as it should.
>
> minerva:~# grep en_US /etc/locale.gen
> en_US UTF-8
> minerva:~# locale-gen
> Generating locales...
> [...]
> en_US.UTF-8... done
> [...]
> Generation complete.
> minerva:~# echo '[' | LANG=en_US.UTF-8 egrep '^[^[:space:]]+$'
> [
> minerva:~# echo '[' | LANG=en_US egrep '^[^[:space:]]+$'
> [
> minerva:~# echo '[' | LANG=C egrep '^[^[:space:]]+$'
> [
> minerva:~#
>
> Comparing the en_US locale, there are no changes. But both include
> the i18n "locale", and this changed between the versions. It was
> updated to use Unicode version 3.2 form version 3.0 I do not know the
> CTYPE part of the locale well enough to say if the changes are
> relevant or not. I found nothing in the change modifying the
> behaviour of <U005B>, which I believe is the value for '['.
>
> Could it be some regex error in glibc?
With 2.3.2.ds1-9,
$ echo '[' | LANG=C egrep '[[:space:]]+$'
$ echo '[' | LANG=C egrep '^[^[:space:]]+$'
[
Why don't you use [[:space:]] instead of ^[^[, or am I missing?
However, I wonder why this result is come:
$ echo '[' | LANG=en_US egrep '[[:space:]]+$'
$ echo '[' | LANG=en_US egrep '^[^[:space:]]+$'
[
$ echo '[' | LANG=en_US.UTF-8 egrep '[[:space:]]+$'
[
$ echo '[' | LANG=en_US.UTF-8 egrep '^[^[:space:]]+$'
Regards,
-- gotom
Reply to: