[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [debian-user] Converting to UTF-8 from ISO-8859



>--[Alex Malinovich]--<demonbane@the-love-shack.net>

> 1) I've set up an .Xmodmap file to map my left Windows key to Multi_key
> so that I can type extended characters. However, I have to run "xmodmap
> .Xmodmap" manually every time I restart X. I'm guessing that I should
> put this in an X startup script. A .bashrc equivalent for X.
> Unfortunately, I'm not sure what the proper file to put it in is.

I don't know an answer to this one, but isn't the right Windows key used by
it by default already?

> 2) Is there a way to get UTF-8 support in a regular text console?

Edit /etc/console-tools/config to contain a line like "SCREEN_FONT=lat0-16"
IIRC. And of course have LC_ALL set correctly.

> 3) Assuming that #2 is possible, how can I type extended characters in a
> text console? While in X, I can, for example, type "Windows Key", Y, =,
> and get the yen symbol (¥).

There definately is a way to modify the keyboard layout. Try
dpkg-reconfigure console-common, there is some way to select one. Whether it
will have the requested bindings, I don't know...

> 5) Just to satisfy my own curiosity, could someone explain the
> difference between all of the different UTF flavors? I've seen UTF-7,
> UTF-8, UTF-16

UTF-8 is the encoding of choice; if encodes unicode code points into
sequences of 8bit characters. Main characteristics: ASCII transparent, i.e.
every US-ASCII text is also an UTF-8 text; stateless, i.e. each valid UTF-8
sequence has always the same meaning independent from the text before; UTF-8
strings are simple C strings. The UTF-7 encoding is a 7bit encoding, and as
such cannot be US-ASCII transparent; it's only use is for emails as UTF-7
does not require another layer of encoding as 8bit characters need in
emails. UTF-16 uses a variable length of 16bit characters. Only very obscure
unicode codepoints require more than one 16bit character, while most are
just one. It can't be US-ASCII transparent - a UTF-16 string containing
characters from the US-ASCII (or ISO-8859-1) range will have embedded 0
bytes and thus won't be a valid C string. Also, UTF-16 uses 16bit values and
as such has endianness issues.

-- 
         100 DM =  51  € 13 ¢.
         100  € = 195 DM 58 pf.
  mailto:ruediger@ruediger-kuhlmann.de
    http://www.ruediger-kuhlmann.de/



Reply to: