[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#612707: Upgrade Lenny to Squeeze (s390 port) results in unbootable system (recovery successful)



Package: upgrade-reports
Version: 6.0

I just finished doing an upgrade from Lenny to Squeeze on the s390
port, running in a virtual machine under z/VM 5.4.0.  I was ultimately
successful, but the upgrade was not smooth.  I encountered two major
problems, one related to initramfs-tools and/or module-init-tools
and the other related to udev.

First, I will describe my disk environment:

   device  device      partition    mount  device  device        device
   number  name        name         point  type    format        driver
   ------  ----------  -----------  -----  ------  ------------  ------
   0200    /dev/dasda  /dev/dasda1  /      3390    CMS reserved  DIAG
   0201    /dev/dasdb  /dev/dasdb1  /boot  3390    CMS reserved  ECKD
   0202    /dev/dasdc  /dev/dasdc1  /home  3390    CMS reserved  DIAG
   0203    /dev/dasdd  /dev/dasdd1  swap   3390    CMS reserved  DIAG

/etc/modprobe.d/dasd contained this statement:

   options dasd_mod dasd=0.0.0200(diag),0.0.0201,0.0.0202-0.0.0203(diag)

/etc/initramfs-tools/modules contained this statement:

   dasd_diag_mod

/etc/initramfs-tools/conf.d/driver-policy contained this statement:

   MODULES=dep

The zero-length files in /etc/sysconfig/hardware originally created by
the Debian installer for the four dasd devices had been erased (rm),
since they are now being mounted by the kernel (via the options passed
to the dasd_mod module) instead of sysconfig-hardware.

/etc/fstab looked like this:

   proc /proc proc defaults 0 0
   /dev/dasda1 / ext3 defaults,errors=remount-ro 0 1
   /dev/dasdb1 /boot ext3 defaults 0 2
   /dev/dasdc1 /home ext3 defaults 0 2
   /dev/dasdd1 none swap sw 0 0

All of this worked fine under Lenny.  I made sure my Lenny system was
up-to-date with the latest point release and security updates before
starting.  Also, I purged all obsolete packages.  I checked out the
package database and found it in perfect condition.  I then updated
/etc/apt/sources.list and pointed it to the squeeze repositories.
I then did

   apt-get update
   apt-get upgrade

It updated a bunch of packages, but did not install any new packages
or delete any old packages.  So far, so good.  I then upgraded the
kernel and udev and did a reboot.

   apt-get install linux-image-2.6.32-5-s390x
   apt-get install udev
   shutdown -r now;exit

I did this from a remote SSH client.  The machine shut down, but
did not reboot.  (Yes, a new initial RAM file system was built
and zipl was re-run.)  I logged on to the virtual machine console
using a 3270 emulator under z/VM and rebooted again.  The kernel
began to boot, but it was unable to mount the permanent root file
system and dropped me into an "(initramfs)" busybox boot prompt.

   cat /proc/modules

revealed that modules dasd_mod, dasd_eckd_mod, and dasd_diag_mod,
among others, were loaded.  Poking around in the sysfs pseudo
file system revealed that the "use_diag" pseudo files
(/sys/bus/ccw/devices/0.0.0200/use_diag,
/sys/bus/ccw/devices/0.0.0202/use_diag, and
/sys/bus/ccw/devices/0.0.0203/use_diag all contained the value
"1", and /sys/bus/ccw/devices/0.0.0201/use_diag contained the
value "0".  This meant that the options statement in
/etc/modprobe.d/dasd had been read and applied.  But only
device number 0201 (/dev/dasdb1) had its "online" flag set
to "1" in /sys/bus/ccw/devices/0.0.0201.  The online flags
for the other three disk devices were set to 0.  I prodded
the boot process along by issuing these commands at the
"(initramfs)" boot prompt:

   echo 1 >/sys/bus/ccw/devices/0.0.0200/online
   echo 1 >/sys/bus/ccw/devices/0.0.0202/online
   echo 1 >/sys/bus/ccw/devices/0.0.0203/online
   exit

This manually brought the disk devices online and the boot process
continued.  I eventually got a login prompt on the 3215
virtual console, but I was unable to establish a connection
using a remote SSH client.  I logged in as root using the
3215 virtual console and did some poking around.  I discovered
that the network device (a virtual OSA) had not come online.
I got it manually working with the following commands:

   # echo 0.0.0300,0.0.0301,0.0.0302 >/sys/bus/ccwgroup/drivers/qeth/group
   # echo 0 >/sys/bus/ccwgroup/drivers/qeth/0.0.0300/layer2
   # echo 1 >/sys/bus/ccwgroup/drivers/qeth/0.0.0300/online
   # ifup eth0

(My virtual OSA was at device number 0300.)  This brought up
the network interface and I was then able to login using a
remote SSH client.

Without documenting the debugging process, let me just cut to
the chase and document the fixes for the above two problems.
The first was fixed by editing /etc/modprobe.d/dasd and adding
the following two lines:

   softdep dasd_eckd_mod pre: dasd_diag_mod
   softdep dasd_fba_mod pre: dasd_diag_mod

The above two lines document a couple of "soft dependencies".
Both dasd_eckd_mod and dasd_fba_mod have a "hard dependency" on
dasd_mod, but now they also have a "soft dependency" on
dasd_diag_mod.  This causes the kernel to make sure that
dasd_diag_mod is loaded before it loads either dasd_eckd_mod
or dasd_fba_mod.  For any system which uses the DIAG driver,
this step is now necessary.  It didn't used to be necessary
under Lenny.  (I also renamed /etc/modprobe.d/dasd to
/etc/modprobe.d/dasd.conf, in accordance with current practice.)

Note that it is still necessary to list dasd_diag_mod in
/etc/initramfs-tools/modules, since with MODULES=dep in
/etc/initramfs-tools/conf.d/driver-policy, dasd_diag_mod
would not otherwise be included in the initial RAM file system.
Soft dependencies are not recognized when building the initial
RAM file system with MODULES=dep, only hard dependencies.

With these changes, the disk devices now come online automatically
at boot time.  (Note that this problem only affects systems
which use the DIAG driver.)

The second problem, that of the network device not coming
online, was solved by changing the udev rules files.
Fortunately, I had another Debian server to which I had
done an install of Squeeze from scratch that I could compare
with.  The Lenny system had the following files:

/etc/udev/rules.d/65-sysconfig-hardware-net.rules:

   SUBSYSTEM=="net", IMPORT{program}="/etc/sysconfig/scripts/hardware/udev-net $env{PHYSDEVPATH}"
   SUBSYSTEM=="net", ENV{INTERFACE_NAME}=="?*", NAME="$env{INTERFACE_NAME}"

/etc/udev/rules.d/85-sysconfig-hardware.rules:

   SUBSYSTEM=="ccw", WAIT_FOR_SYSFS="online"
   SUBSYSTEM=="ccw", RUN+="/sbin/hwup -A -D $DEVPATH $env{SUBSYSTEM} %b"

My other Squeeze system, installed from scratch, contained the following:

/etc/udev/rules.d/65-sysconfig-hardware-net.rules:

   SUBSYSTEM=="net", IMPORT{program}="/etc/sysconfig/scripts/hardware/udev-net $DEVPATH"
   SUBSYSTEM=="net", ENV{INTERFACE_NAME}=="?*", NAME="$env{INTERFACE_NAME}"

/etc/udev/rules.d/85-sysconfig-hardware.rules:

   SUBSYSTEM=="ccw", WAIT_FOR_SYSFS="online"
   SUBSYSTEM=="ccw", RUN+="/sbin/hwup -A -D $devpath $env{SUBSYSTEM} $kernel"

The rules were similar between the two systems, but not exactly the same.
I made the rules on the partially-upgraded Lenny system match the rules
on the directly-installed Squeeze system exactly, saving the old versions
in another directory in case I needed to restore them.  I then rebuilt
the initial RAM file system with

   # update-initramfs -uk 2.6.32-5-s390x

(for good measure), re-ran zipl, shutdown, and re-booted.  This time both
the dasd devices and the network device came online automatically during
boot.  From there, I was able to finish the upgrade without any significant
difficulty.

Either the upgrade process should automate these things or the release
notes should document these things.  These problems are unique to the
s390 port; so they should be documented in an s390-specific section.

Cheers,

-- 
  .''`.     Stephen Powell    
 : :'  :
 `. `'`
   `-



Reply to: