[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#251550: Keyboard freezes when typing non-ASCII letters (was Re: Bug#251550: Bug#254630: LVM names)



On Sun, Jun 20, 2004 at 08:25:06AM +0200, Christian Perrier wrote:
> Quoting Eugeniy Meshcheryakov (eugen@univ.kiev.ua):
> > Christian Perrier wrote:
> > >
> > >Currently unusable in cdebconf (seems to be a whiptail bug in Unicode
> > >environments). Just try to enter any non ASCII character in a dialog
> > >box..:-(
> > >
> > >#251550
> > >
> > I can enter cyrillic characters (that are not ASCII) used in Ukrainian 
> > in d-i. This looks more like problem with keymap files.
> 
> Hmmmm, so nothing to do with whiptail, then? I'm puzzled.
> 
> Let's ask Alastair, he will maybe have some ideas....
> 
> Alastair, could you have a look at #251550?
> 
> Basically, you just enter a non ASCII character in a dialog during
> Debian Installer 1st stage (when installing in French, German...and
> probably even English), for instance in the dialog asking for a host
> name or IP address. 
> 
> Then the display seems frozen : typing anything just does nothing. You
> have to hit Ctrl-A for having it working again.
> 
> This is a serious problem because any input of such high ASCII
> character will "freeze" the installer, from the user point of view.

This looks somewhat similar to 243373, output was truncated when
illegal UTF-8 sequences were printed.  Here input is broken when
keyboard sends illegal UTF-8 sequences.  Of course keyboard
should send valid UTF-8 sequences, so one cannot blame whiptail
too much.

According to kbd_mode(1) Linux console keyboard driver has 4 modes,
two of them are of interest for us, namely ASCII and UTF-8 modes.
Internally Linux kernel uses Unicode; in UTF-8 mode, there is no
conversion, characters are passed to the kernel (there seems to be
a UTF-16 -> UTF-8 conversion, but it can be ignored).  In ASCII mode,
characters are converted to Unicode by using the charset found when
loadkeys was invoked.
On the other hand keymaps(5) explains how to write keymap files.
Characters can be defined numerically (decimal or octal value),
litterally (e.g. eacute) or with their Unicode codepoints (eg. U+00E9).
When loadkeys parses keymap files, numerical and litteral values are
converted to 0-255 values (according to a charset) whereas Unicode
values are stored as complement to 0xf000.  A value is then (roughly)
decoded by:
  * if value >= 0x0c00, this is a Unicode character: value ^ 0xf000
  * otherwise this character had a numerical or litteral notation,
    and its value in the current charset is the last significant byte.

For reasons I do not understand, these 2 conversions (keyboard mode and
input parsing) are mixed.  Now consider this line from fr-latin1:
  keycode 3 = eacute two dead_tilde

  $ kbd_mode -a
  $ export LANG=fr_FR
  $ dumpkeys -n
  ...
  keycode   3 = 0x00e9 0x0032 0x0403 0x0000          
  ...
  $ export LANG=fr_FR.UTF-8
  $ unicode_start
  $ dumpkeys -n
  ...
  keycode   3 = 0x00e9 0x0032 0x0403 0x0000          
  ...

But now keyboard is in UTF-8 mode, so bytes are passed to the kernel
without conversion, and 0x00 0xe9 is sent instead of its UTF-8
representation 0xc3 0xa9.
If eacute is replaced by its Unicode notation U+00E9 in keymap
file, everything works fine now.  But this keymap cannot be used
when keyboard is in ASCII mode, The only solution is to have two
keymaps, one for ASCII mode and the other one for UTF-8 mode.

This looks pretty crazy, loadkeys should automatically convert from
numerical/litteral value to Unicode notation (and vice versa)
depending on current keyboard mode.

Denis



Reply to: