[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [debian-user] Converting to UTF-8 from ISO-8859



On Sat, 2003-06-14 at 06:58, Rüdiger Kuhlmann wrote:
> >--[Alex Malinovich]--<demonbane@the-love-shack.net>
> 
> > 1) I've set up an .Xmodmap file to map my left Windows key to Multi_key
> > so that I can type extended characters. However, I have to run "xmodmap
> > .Xmodmap" manually every time I restart X. I'm guessing that I should
> > put this in an X startup script. A .bashrc equivalent for X.
> > Unfortunately, I'm not sure what the proper file to put it in is.
> 
> I don't know an answer to this one, but isn't the right Windows key used by
> it by default already?

Not on my system. xmodmap shows the two Windows keys set to Super_L and
Super_R.

> > 2) Is there a way to get UTF-8 support in a regular text console?
> 
> Edit /etc/console-tools/config to contain a line like "SCREEN_FONT=lat0-16"
> IIRC. And of course have LC_ALL set correctly.

I've done this, and set LC_ALL to en_US.UTF-8, but I still can't get
proper UTF character support in a console. I have files with letters
like, Ü, Ć, Æ, and Š in the names which show up fine in gnome-terminal.
But from a text console, with my locale set to en_US.UTF-8, I get two
garbled characters ('Ü' = 'Ã?'). This also makes ncurses apps mostly
unusable. Setting LC_ALL to C fixes ncurses and displays all extended
characters as '??'.

What's odd is that I can type some of these characters in the console
and have them show up correctly. i.e. Hitting Compose, A, E, produces an
Æ, yet doing an "ls Æ*" in a directory that has filenames starting with
Æ (which are garbled as stated above) returns no results. Other
characters, like Ć, Č, and Š for example, don't appear to be available
at all.

> > 3) Assuming that #2 is possible, how can I type extended characters in a
> > text console? While in X, I can, for example, type "Windows Key", Y, =,
> > and get the yen symbol (¥).
> 
> There definately is a way to modify the keyboard layout. Try
> dpkg-reconfigure console-common, there is some way to select one. Whether it
> will have the requested bindings, I don't know...

Your suggestion for modifying /etc/console-tools/config got me on the
right track to taking care of this problem. In that directory is a file
called remap which allowed me to remap Windows keys to Compose.

> > 5) Just to satisfy my own curiosity, could someone explain the
> > difference between all of the different UTF flavors? I've seen UTF-7,
> > UTF-8, UTF-16
> 
> UTF-8 is the encoding of choice; if encodes unicode code points into
> sequences of 8bit characters. Main characteristics: ASCII transparent, i.e.
> every US-ASCII text is also an UTF-8 text; stateless, i.e. each valid UTF-8
> sequence has always the same meaning independent from the text before; UTF-8
> strings are simple C strings. The UTF-7 encoding is a 7bit encoding, and as
> such cannot be US-ASCII transparent; it's only use is for emails as UTF-7
> does not require another layer of encoding as 8bit characters need in
> emails. UTF-16 uses a variable length of 16bit characters. Only very obscure
> unicode codepoints require more than one 16bit character, while most are
> just one. It can't be US-ASCII transparent - a UTF-16 string containing
> characters from the US-ASCII (or ISO-8859-1) range will have embedded 0
> bytes and thus won't be a valid C string. Also, UTF-16 uses 16bit values and
> as such has endianness issues.

So I'm guessing that UTF-8 can use multiple bytes per character somehow?
Just keeping the 100 or so equivalent to the ASCII characters?

-- 
Alex Malinovich
Support Free Software, delete your Windows partition TODAY!
Encrypted mail preferred. You can get my public key from any of the
pgp.net keyservers. Key ID: A6D24837

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: