[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What does charset in locale setting affect?



Roger Leigh wrote:
On Sat, Sep 01, 2012 at 07:32:48PM -0400, Dan B. wrote:
...

Which common programs (e.g., getty, xterm/etc., sed/grep?) do something
different based on the charset portion of the local setting?

All of them, in short.

When you run a terminal emulator such as xterm, it will get the
encoding to use inside the emulator using nl_langinfo(3).    ...


What about the virtual consoles?

Whether I choose a default system locale of UTF-8 or None (in the
dialog for "dpkg-reconfigure locales"), and log out and log in (to
make sure the shell has a chance to get fresh settings), then

  echo $'\xC2\xA2'

displays the same thing (the cent sign).

Is the virtual console supposed to follow the locale's character
encoding?  If so, does something else (e.g., something in /etc/init.d/)
need to be run to make a difference?


No, I'm not actually trying to turn off using UTF-8.  I'm just trying
to find out how things work (what actually is affected by the locale
settings).


Actually, what I really want to know is how to revert the sorting of
file names from ls (and Emacs dired listings) from the order caused
by having "en_US" in LANG=en_US.UTF-8 back to the traditional (old)
Unix order (e.g., what LANG=C would yield) without messing up all the
UTF-8 support that's all over Linux now.


First of all, can UTF-8 be combined with the "C" locale as in
LANG=C.UTF-8?

Do I probably want something closer to LANG=en_US.UTF-8 LC_COLLATE=C
(in order to reduce the amount of locale settings I'm overriding)?



When you run sed/grep, the encoding will affect how it processes the
text.

Are you sure about sed?

I tried probing how LANG= vs. LANG=en_US.UTF-8 affected whether
the regular expression "[a-z]" matched "X".  Grep seems to be
affected as expected, but sed never matched.  (That's on Squeeze.)

Daniel




Reply to: