[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#747922: kernel hang during boot on s390x in a virtual machine under z/VM when logged on to a terminal



Package: linux
Version: 3.10.1-1
Severity: normal

I have discovered a bug in the Linux kernel.
This bug only occurs for the s390x port, only
when running in a virtual machine under z/VM, only with conmode=3215,
and only when the virtual machine is logged on to a 3270 terminal (not
disconnected).  I am using TERM=dumb as a kernel boot parameter, and
the console definition in /etc/inittab looks like this:

   T0:23:respawn:/sbin/getty -L --noclear ttyS0 9600 dumb

The problem is that the kernel hangs during boot.  The last message
displayed on the console during boot before the hang varies.  One
common freeze point for the 3.13 kernel is

   PID hash table entries: 2048 (order: 2, 16384 bytes)

Pressing the Enter key a couple of times gets it going again.  Pressing
it once puts the virtual machine into a "VM READ" state.  Pressing it
the second time causes console output to resume.  However, due to
buffering, I don't think the above message is indicative of where the
kernel actually is in its processing.  Many of the messages have time
stamps on them.  By comparing the time stamps, I can tell where the
long pause actually was.  In a recent boot, for example, I saw the
following sequence of messages:

-----

               Begin: Loading essential drivers ... done.
               Begin: Running /scripts/init-premount ... done.
               Begin: Mounting root file system ...
               Begin: Running /scripts/local-top ... done.
               Begin: Running /scripts/local-premount ...
[    1.973615] PM: Starting manual resume from disk
               done.
[    1.999199] EXT4-fs (dasdc1): mounting ext3 file system using the ext4 subsystem
[    2.042526] EXT4-fs (dasdc1): mounted filesystem with ordered data mode. Opts: (null)
               Begin: Running /scripts/local-bottom ... done.
               done.
               Begin: Running /scripts/init-bottom ... done.

               INIT: version 2.88 booting


               Using makefile-style concurrent boot in runlevel S.

               Starting the hotplug events dispatcher: udevd.

               Synthesizing the initial hotplug events...
[  164.525332] systemd-udevd[277]: starting version 204
               done.

               Waiting for /dev to be fully populated...

-----

(I have reformatted the above messages so that messages without a timestamp
prefix and messages with a timestamp prefix line up starting with the main
message text.)  As you can see, there is a huge time gap between 2.042526
and 164.525332.  That is where it was hung, waiting for me to press the
Enter key.  It is somewhere between mounting the permanent root file system
read only and starting the second instance of the udev daemon.  (The first
instance of the udev daemon starts shortly after mounting the initial RAM
file system.)

By bisecting the kernel using official Debian kernel image packages only,
it appears that the problem exists between 3.9.8-1 and 3.10.1-1.  That is,

   linux-image-3.9-1-s390x_3.9.8-1_s390x.deb     works, and
   linux-image-3.10-1-s390x_3.10.1-1_s390x.deb   fails.

And every version I have tried since 3.10.1-1 fails also.
It should be noted that a kernel which fails does not always fail.
Sometimes it does not hang.  But it hangs the majority of the time.
If the virtual machine is disconnected, that is, not logged on to a
real terminal, it seems to always boot fine, whether the virtual
machine has a SECUSER or not.  When logged on to a terminal, the chances
of failure are increased if the kernel is explicitly selected from the
boot menu, such as with

   #CP VINPUT VMSG 1

as opposed to letting a timeout occur and letting the default kernel
boot via a timeout.  I don't know why that matters, but that has been
my experience.


I will be more than happy to assist in debugging this.

-- 
  .''`.     Stephen Powell    
 : :'  :
 `. `'`
   `-


Reply to: