[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT: Python (was: Make Unicode bugs release critical?)



On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote:
> On 2011-02-14 16:43:11 +0000, Ian Jackson wrote:
> > When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> > characters to stdout should use UTF-8.  That's what LC_TYPE means.
> 
> So, "cat", "grep", etc. are all broken. :)

How come?

"cat" will, for any valid UTF-8 character on input, print a valid UTF-8
character on output.  For any valid ISO-8859-1 character on input, it will
print a valid ISO-8859-1 character on output.  

"grep" on the other hand has to actually understand the encoding -- and it
does.  Try this:
$ echo "ą"|LC_CTYPE=C grep --color=always .
Will be mangled.
$ echo "ą"|LC_CTYPE=en_US.utf-8 grep --color=always .
Will be handled correctly.

-- 
1KB		// Microsoft corollary to Hanlon's razor:
		//	Never attribute to stupidity what can be
		//	adequately explained by malice.


Reply to: