[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#318923: xterm: localization hell again (even with UTF8 locales)



On Mon, Jul 18, 2005 at 05:36:02PM -0400, Thomas Dickey wrote:

> /usr/build/xterm/xterm-203b (101) ./testit
> locale: Cannot set LC_CTYPE to default locale: No such file or directory
> locale: Cannot set LC_MESSAGES to default locale: No such file or directory
> locale: Cannot set LC_ALL to default locale: No such file or directory
> LANG=LANG=uk_UA.UTF-8
> LC_CTYPE="LANG=uk_UA.UTF-8"
> LC_NUMERIC="LANG=uk_UA.UTF-8"
> LC_TIME="LANG=uk_UA.UTF-8"
> LC_COLLATE="LANG=uk_UA.UTF-8"
> LC_MONETARY="LANG=uk_UA.UTF-8"
> LC_MESSAGES="LANG=uk_UA.UTF-8"
> LC_PAPER="LANG=uk_UA.UTF-8"
> LC_NAME="LANG=uk_UA.UTF-8"
> LC_ADDRESS="LANG=uk_UA.UTF-8"
> LC_TELEPHONE="LANG=uk_UA.UTF-8"
> LC_MEASUREMENT="LANG=uk_UA.UTF-8"
> LC_IDENTIFICATION="LANG=uk_UA.UTF-8"
> LC_ALL=LANG=uk_UA.UTF-8
> Mon Jul 18 17:28:28 EDT 2005
> /usr/build/xterm/xterm-203b (102)

	Very strange 'locale' printout above. Perhaps you should run
	'with-locale' with first parameter 'uk_UA.UTF-8', not the entire
	assignment 'LANG=uk_UA.UTF-8'. Btw, your 'date' prints English
	date, not Cyrillic/Ukrainian. And you see these
	'locale: Cannot set ...' diagnostics, which is not too good.

> testit:
> #!/bin/sh
> with-locale LANG=uk_UA.UTF-8 locale
> with-locale LANG=uk_UA.UTF-8 date
>
> with-locale:
> #!/bin/sh
> unset LANG
> unset LC_ALL
> unset LC_CTYPE
> export LANG=$1
> export LC_ALL=$1
> shift
> $*


> For the sake of discussion, a typescript file from your session, showing
> the output of locale and the output of date would show enough information
> to see if it is encoded in UTF-8, and allow me (by cat'ing to the terminal)
> to see what xterm would display.

	Here is 'locale' printout from the 'problem' xterm :

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

	Here is what 'LANG=uk_UA.UTF-8 date' looks like :

	nnn nnn 19 12:59:53 EEST 2005
	(with Terminus fontFace; with fixed/clean fontFaces it prints
	dotted squares instead of 'n's)

	Here is 'locale' printout from Cyrillized/Ukrainized xterm :

LANG=uk_UA.UTF-8
LC_CTYPE="uk_UA.UTF-8"
LC_NUMERIC="uk_UA.UTF-8"
LC_TIME="uk_UA.UTF-8"
LC_COLLATE="uk_UA.UTF-8"
LC_MONETARY="uk_UA.UTF-8"
LC_MESSAGES="uk_UA.UTF-8"
LC_PAPER="uk_UA.UTF-8"
LC_NAME="uk_UA.UTF-8"
LC_ADDRESS="uk_UA.UTF-8"
LC_TELEPHONE="uk_UA.UTF-8"
LC_MEASUREMENT="uk_UA.UTF-8"
LC_IDENTIFICATION="uk_UA.UTF-8"
LC_ALL=

	How 'date' looks there :

	Втр Лип 19 13:02:49 EEST 2005

	'date' passed through 'xxd' looks the same in both xterms :

0000000: d092 d182 d180 20d0 9bd0 b8d0 bf20 3139  ...... ......19

	I.e. Cyrillic letters are encoded as usual, with leading byte
	'd0' or 'd1'.
	
	Moreover, when I select 'nnn nnn' (or dotted squares) in
	problem/English xterm and paste it into Cyrillized one, it is
	pasted perfectly well, giving 'Втр Лип' there.

	I was so glad working with xterm and Unicode locales. I thought
	mixed-lang environment dream has been finally realized :> But
	alas ... :> New changes returned old griefs :>




Reply to: