[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: gawk: Odd regexp matching problem if LANG=ja_JP



On August 18, 2004 at 2:57PM +0900,
miles (at lsi.nec.co.jp) wrote:

> Package: gawk
> Version: 1:3.1.4-1

> Executing the following line in a shell:
> 
>    echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=ja_JP gawk '/[Cc]hangeLog/ { print }'
> 
> yields not the expected two lines of output, but instead only the first one:
> 
>    --- orig/lisp/ChangeLog
> 
> 
> If the LANG-setting portion is changed to use C, then it works as
> expected (others such as "de" seem to work too):
> 
>    echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=C gawk '/[Cc]hangeLog/ { print }'
> 
> yields:
> 
>    --- orig/lisp/ChangeLog
>    +++ mod/lisp/ChangeLog
> 
> 
> I'm not sure if the actual encoding has any impact -- ja_JP, ja_JP.utf8,
> and ja_JP.eucjp all exhibit the same problem.

ko_KR, zh_CN, and zh_TW exhibit the same problem.  On CJK
locales, this bug causes gawk scripts unusable.

Downgrading gawk to version 1:3.1.3-3 prevents the problem.

Could anyone fix this bug?

Thanks,
-- 
Tatsuya Kinoshita

Attachment: pgpeE8QKS9TGm.pgp
Description: PGP signature


Reply to: