[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Problem with console and locales



[ Please put your answers/reactions in the quoted older message instead
  of on top of it. This makes it easier for the other people on the list
  (who may not remember our earlier conversation) to follow the
  discussion. ]

On Sun, Feb 03, 2008 at 21:23:35 -0200, Andres Migliazzo wrote:
> Awesome... we are on the road now, I've tried console-setup package in
> combination with the
> console-terminus fonts as you told me, but the issue still remains. When I
> use "more" I see a "white square" instead of  "á - ú or ñ" characters, and
> when I check the text file with aspell it does not show these special
> characters neither (shows "C rdoba" instead of "Córdoba").

Maybe the file itself is not encoded in utf-8. Try to run "file" on it:

$ file test1.txt test2.txt
test1.txt: UTF-8 Unicode text
test2.txt: ISO-8859 text

(I created two short test files for this demonstration.)

$ hd test1.txt test2.txt
00000000  43 c3 b3 72 64 6f 62 61  0a 43 f3 72 64 6f 62 61  |C..rdoba.C.rdoba|
00000010  0a                                                |.|

You see that the "ó" character is encoded as 0xc3 0xb3 (2 bytes) in
utf-8, but as 0xf3 (1 byte) in iso8859-1. Therefore, things can go
wrong:

$ cat test1.txt test2.txt
Córdoba
C�rdoba

My utf-8-based terminal does not understand the iso-8859 encoded "ó" and
prints a placeholder symbol instead. The "�" shows up as a question mark
sign in my X terminal and as a white square on my tty with the terminus
font.

You can convert the text to utf-8 with the "iconv" utility and then it
should work:

$ iconv -f iso8859-1 -t utf8 test2.txt > test3.txt
$ cat test3.txt
Córdoba

The problem is that plain text files do not necessarily contain a header
that specifies the encoding, therefore a program that has to process the
text might not interpret the byte sequences correctly. (The "file"
utility analyzes the file and determines the encoding, but many other
programs just use some default setting.) The best approach is probably
to convert everything to utf-8 and set the defaults of all editors and
pagers accordingly.

If you have filenames with non-standard characters then the "convmv"
package can help to convert them so that they show up correctly on an
utf-8 setting.

Finally, don't forget that you have to put

\usepackage[utf8]{inputenc}

into the preamble of your latex documents for latex to process the
special characters in a utf-8 input file correctly.
 
>  Maybe if I show you some of mine configuration files we can solve this:

[...]

> Euclides:/etc# egrep -v "^\#|^$" default/console-setup

[...]

> XKBMODEL="pc105"
> XKBLAYOUT="es"
> XKBVARIANT="nodeadkeys"
> XKBOPTIONS="lv3:ralt_switch,compose:rwin"
> BOOTTIME_KMAP_MD5="80842f76431ec5259444e6b8f4a53b62"
> #====================================================================
> 
> my xorg.conf file contains:
> 
> #====================================================================
> Section "InputDevice"
>         Identifier      "Generic Keyboard"
>         Driver          "kbd"
>         Option          "CoreKeyboard"
>         Option          "XkbRules"      "xorg"
>         Option          "XkbModel"      "pc105"
>         Option          "XkbLayout"     "es"
>         Option          "XkbVariant"    "la"
> EndSection
> #====================================================================

That looks all OK to me. If you want you can try whether console-setup
supports the "la" XKBVARIANT, then the two configurations would be
identical.

-- 
Regards,            | http://users.icfo.es/Florian.Kulzer
          Florian   |


Reply to: