[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#621080: Improving the s390 boot process



Package: sysconfig-hardware
Version: 0.0.10
Severity: wishlist
X-Debbugs-CC: debian-s390@lists.debian.org, debian-boot@lists.debian.org

s390 behaves differently than most hardware platforms with respect
to disk devices.  For most hardware platforms, udev is sufficient to
recognize disk devices and their partitions, and udev creates the
aliases in /dev/disk/by-uuid, /dev/disk/by-label, etc.  On the s390
platform, an extra step is required.  The devices must be
configured (i.e. the use_diag and/or readonly variables must be
appropriately set in the sysfs pseudo file system, or be allowed to
keep their default values) and then brought online (the online
variable must be set to 1 in the sysfs pseudo file system).  Only
then do block devices show up to which udev can react (i.e.
recognize partitions, disk labels, uuids, etc.)

There are two main ways that disk devices can be brought online
in Linux for s390.  One way is via the "dasd" option passed to
the dasd_mod kernel module.  For example, suppose that the file
/etc/modprobe.d/dasd.conf exists and contains the following
statements:

   options dasd_mod dasd=0.0.0200(diag),0.0.0201,0.0.0202(ro),0.0.0203(diag:ro)
   softdep dasd_eckd_mod pre: dasd_diag_mod
   softdep dasd_fba_mod pre: dasd_diag_mod

Assuming that the initial RAM file system is re-built after creating
this file and that zipl is re-run, then on the next boot the kernel
will bring the four listed devices online automatically at boot time
using the configuration options specified.  (When using MODULES=dep
in /etc/initramfs-tools/initramfs.conf or
/etc/initramfs-tools/conf.d/driver-policy, something must cause
dasd_diag_mod to be included in the initial RAM file system, since
"soft" dependencies are not currently taken into account by initramfs-tools
when MODULES=dep is used.  See Debian bug report 588452.)

The other main way for disk devices to be brought online is via
sysconfig-hardware.  That is the method that the Debian port of Linux
for s390 has historically used and which the s390 version of the
Debian Installer assumes.  sysconfig-hardware provides a couple of udev rules
in /lib/udev/rules.d/85-sysconfig-hardware.rules that cause the hwup
command to be issued when a ccw device is detected.  hwup then configures the
device and brings it online, if there is a configuration file for it in
/etc/sysconfig/hardware.  There are a number of problems with this
implementation, however.

The first problem is that a hardware device brought online via
sysconfig-hardware cannot be varied offline again.  This problem is
addressed in Debian bug report 620095, and the fix for this problem is
trivial.  A second problem is a lack of recognition for a 3380 device
attached to a 3880 control unit.  This problem is addressed in Debian
bug report 620126.  Again the fix for this problem is trivial.  And
a third problem is the lack of support for any DASD configuration options.
This problem is addressed in Debian bug report 620205.  Although
this is not a one-line change like the other two bugs, it can still be
solved easily by adding about twelve lines of code to a shell script.
This is still relatively trivial.

There is one final problem with the sysconfig-hardware method of
bringing disk devices online, and it is this problem that this bug
report will address.  The problem is that sysconfig-hardware is not
present in the initial RAM file system.  Once the root file system
specified in the kernel boot parameters has been brought online and
mounted as / (read-only at first, usually), then udev is restarted,
and sysconfig-hardware can then be used to bring other disks online
(/boot, /home, swap partitions, etc.).  But sysconfig-hardware cannot
be used to bring the disk containing the / partition online.

To circumvent this problem,
/usr/share/initramfs-tools/scripts/init-premount/sysconfig_hardware
was written.  This is a script supplied by sysconfig-hardware
and invoked by initramfs-tools prior to the attempt to mount
the permanent root file system (read-only at first, usually).
It works.  But this method has its drawbacks.  First of all,
this only works if the root file system is specified via a particular
form of a udev-created symbolic link to the block special file
for the partition, namely:
/dev/disk/by-path/ccw-0.0.@@@@-part#, where @@@@ is the four-digit
hexadecimal device number of the DASD device and # is the partition
number (1, 2, or 3 for cdl, always 1 for ldl or CMS format).
If the kernel boot parameters specify the root file system any
other way, such as by UUID, by LABEL, etc., the
/usr/share/initramfs-tools/scripts/init-premount/sysconfig_hardware
script cannot figure out what device to bring online.  And the
boot therefore hangs waiting for the initial RAM file system.

Another problem is that RESUME processing doesn't work.
In order for RESUME processing to work, the disk device which
contains the swap partition which contains the RESUME image
must be brought online prior to the initial read-only mount
of the permanent root file system.
/usr/share/initramfs-tools/scripts/init-premount/sysconfig_hardware
has no mechanism for bringing this device online, and therefore
RESUME will always fail.  (Quite frankly, I don't know if there
is a corresponding SUSPEND mechanism on this hardware platform or not;
but if there is, the RESUME will never work.)

I have been experimenting with putting sysconfig-hardware
in the initial RAM file system, and I have been quite successful
with it.  It solves all of the above problems.  I'd like to share
with you what I did, share the results, and appeal to have
this implemented as the standard way of doing things in Wheezy.

First, I manually fixed the bugs reported in Debian bug reports
620095, 620126, and 620205.  (The explanation of what needs fixed
is present in all these bug reports.)  Second, I deleted
/etc/udev/rules.d/65-sysconfig-hardware-net.rules and
/etc/udev/rules.d/85-sysconfig-hardware.rules.  These are duplicates of
/lib/udev/rules.d/65-sysconfig-hardware-net.rules and
/lib/udev/rules.d/85-sysconfig-hardware.rules, respectively.
Therefore, they are redundant.  Third, I erased
/usr/share/initramfs-tools/scripts/init-premount/sysconfig_hardware.
With my method, this file is not needed.
Fourth, I created configuration files for all of my DASD devices
in /etc/sysconfig/hardware, specifying the DASD configuration options
(DASD_USE_DIAG and DASD_READONLY) as appropriate for each device.
Fifth, I removed the line in /etc/initramfs-tools/modules that
listed dasd_diag_mod.  This was my former way of including dasd_diag_mod
in the initial RAM file system, since I use MODULES=dep in
/etc/initramfs-tools/conf.d/driver-policy, but I now have a better way.

Sixth, I edited /etc/modprobe.d/dasd.conf, a local file I had
created, and removed the "options dasd_mod ..." line from the file.
This was my former method of bringing the devices online: specifying
them in the dasd option passed to the dasd_mod module.  However,
I left intact the two "softdep" lines which specify that dasd_diag_mod
must be loaded prior to dasd_eckd_mod or dasd_fba_mod.  This is
essential if any DASD devices use the DIAG driver.  Seventh, I
created a file which I called /etc/initramfs-tools/hooks/sysconfig-hardware.
Since this is a user-created hook, it properly belongs under /etc.
But in a production environment, it should probably reside in
/usr/share/initramfs-tools/hooks.  My hook script looks like this:

-----

#!/bin/sh
PREREQ=""
prereqs()
{
        echo "$PREREQ"
}

case $1 in
prereqs)
        prereqs
        exit 0
        ;;
esac

. /usr/share/initramfs-tools/hook-functions

# Begin real processing below this line

manual_add_modules dasd_diag_mod

copy_exec /lib/udev/rules.d/85-sysconfig-hardware.rules
copy_exec /sbin/hwup
copy_exec /bin/bash
copy_exec /sbin/hwdown

for x in /etc/sysconfig/hardware/*
do
   if [ "$(basename $x)" != "*" ]; then
      copy_exec $x
   fi
done

copy_exec /etc/sysconfig/scripts/common/functions

for x in /etc/sysconfig/scripts/hardware/*
do
   copy_exec $x
done

-----

Eighth, I then marked the script executable with

   chmod +x /etc/initramfs-tools/hooks/sysconfig-hardware

Ninth, I edited /etc/zipl.conf and changed the root file specification
to use a UUID.  For example,

   parameters = root=UUID=3b516bc5-05d2-4d8c-99ca-55558ca7d47a ro vmhalt=LOGOFF vmpoff=LOGOFF

(Note that by default this needs to be changed in two places: one under
[debian] and one under [old].)  I had previously edited /etc/fstab
and /etc/initramfs-tools/conf.d/resume to use UUID specifications.
Tenth, I rebuilt the initial RAM file system image with

   update-initramfs -uk $(uname -r)

This also caused zipl to be run.  Finally, I shutdown and rebooted.
Everything worked perfectly.  RESUME processing found the partition
which is supposed to contain the RESUME image and tried to do a RESUME.
Of course, RESUME failed with a return code of -22; since no SUSPEND
had been done during shutdown, but at least that code was executed.
It never could find the partition before.  It also found the permanent
root file system by UUID and mounted it.  No problem.  The boot process
now works almost identically with how it works on the i386 architecture.
Keeping the boot process as similar as possible across architectures
is a good thing!

I then shutdown and booted by backup kernel and rebuilt its initial
RAM file system image also.

The sysconfig-hardware method of bringing DASD devices online is
more complex than the direct kernel method, but it is more flexible.
There is a practical limit to the size of the string passed via the
dasd option to the dasd_mod module in an options statement in a .conf file
in the /etc/modprobe.d directory.  With sysconfig-hardware, however,
each device has its own configuration file.  This is much easier to
manage for Linux machines which use a large number of DASD devices.
However, until this bug (i.e. enhancement request) and all the other
bugs against sysconfig-hardware mentioned in this bug report are
fixed, the kernel method is the only method that is fully functional.

Notes:

(1) The "manual_add_modules dasd_diag_mod" line in
/etc/initramfs-tools/hooks/sysconfig-hardware is what causes the dasd_diag_mod
kernel module to be included in the initial RAM file system, since I use
MODULES=dep in /etc/initramfs-tools/conf.d/driver-policy.  This is
preferable to listing it in /etc/initramfs-tools/modules because listing
it in /etc/initramfs-tools/modules also causes initramfs-tools to attempt
to load it when it comes to the point of "loading essential drivers".
It may be too late by then.  The only way to make sure that dasd_diag_mod
gets loaded at the proper time is to use the two "softdep" lines in
/etc/modprobe.d/dasd.conf, as shown above.

(2) Note that /lib/udev/rules.d/85-sysconfig-hardware.rules was included
in the initial RAM file system, but
/lib/udev/rules.d/65-sysconfig-hardware-net.rules was not.  I did not
deem it necessary to fully configure network devices prior to the
initial read-only mount of the permanent root file system.  Network devices,
such as an OSA, will have their configuration finished after the
permanent root file system is initially mounted read-only.

(3) In an attempt to reduce the size of the initial RAM file system,
I attempted to remove all the "bashisms" from the scripts belonging
to sysconfig-hardware; so it would be able to run under ash.  This would
allow me to avoid including /bin/bash in the initial RAM file system.
Most of the bashisms could be easily eliminated, but arrays were a
problem.  For example, my OSA configuration file,
/etc/sysconfig/hardware/config-ccw-0.0.0300, contains the following:

   CCWGROUP_CHANS=(0.0.0300 0.0.0301 0.0.0302)

This is an array assignment statement, which gets "sourced" into
/etc/sysconfig/scripts/hardware/hwup-ccw-group during execution.
bash supports arrays, but ash does not.  In the end I decided that it
was more trouble than it was worth and left them all as bash scripts.
Thus, /bin/bash had to be included in the initial RAM file system.
Initial RAM file system size is not really a problem in s390.
For one thing, s390 has a lot fewer device driver modules to deal with
than, say, the i386 architecture.  Also, s390 is not subject to a
16M memory restriction, as LILO is on the i386 architecture for some
machines with a very old BIOS.  zipl can access up to 2G of RAM.

(4) Note that in my example the vmhalt=LOGOFF and vmpoff=LOGOFF kernel
boot parameters are used.  For a Linux system running in a virtual machine
under z/VM, this causes the CP LOGOFF command to be issued during shutdown
when a "halt" or "power-off" signal is received.  But this only works if
the vmcp kernel module is loaded.  I list vmcp in /etc/modules to accomplish
this.

-- 
  .''`.     Stephen Powell    
 : :'  :
 `. `'`
   `-



Reply to: