[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[SOLVED] Can't boot 2.6.32-3 after running 2.6.32-5 for a while



On Wed, 23 Jun 2010 18:05:49 -0400 (EDT), Camaleón wrote:
> 
> I'm interested in seeing how your testing goes.

OK, I found the problem.  It's a user error, but not an obvious one.
Here's the deal.  At boot loader install time, lilo translates the
root device specification into a number.  In the general case, the
number is a four-digit hexadecimal number, where the first two digits
represent the major number and the second two digits represent the
minor number.  (However, a leading zero is usually suppressed.)
When running under the 2.6.32-5 kernel, the root device, as viewed
by the kernel, is major number 8, minor number 2.  So lilo translated
the root specification internally to "802".  At boot time, it is
this number that is passed to the kernel as a boot parameter.  For
example,

   root=802

However, the IDE driver and the SCSI driver use different major
numbers.  The SCSI driver uses major number 8.  The IDE driver uses
major number 3.  The minor numbers were the same in this case.
When booting the 2.6.32-3 kernel, lilo still passed

   ... root=802 ...

as a kernel boot parameter.  But under the 2.6.32-3 kernel, there is
no such device.  Thus, the kernel could not find the permanent root
device; and the boot process hung.  The solution was to use a
direct specification of the UUID in /etc/lilo.conf instead of an
indirect one via a udev symbolic link.  For example, instead of
specifying

   root=/dev/disk/by-uuid/xxxxxxx...

I specified

   root="UUID=xxxxxxx..."

in /etc/lilo.conf.  I then re-ran lilo.  When this method of specifying
the root device is used, lilo does not pass a major and minor number as
a kernel boot parameter at boot time, but an actual UUID string,
which both kernels are able to match.  The 2.6.32-3 kernel matches it
with major number 3, minor number 2 (/dev/hda2) and the 2.6.32-5 kernel
matches it with major number 8, minor number 2 (/dev/sda2).  Problem
solved.

Note that it is OK to use a udev-created symbolic link for the "boot"
specification, since that is used only at boot loader install time
to determine where to write the boot block.  It is not passed to the
kernel at boot time.

I made some discoveries along the way which other users may find
interesting.  First, the program used by the Debian installer to
format a swap partition does not assign a UUID.  So you can't specify
your swap partition in /etc/fstab by means of a UUID until you reformat
your swap partition using the regular mkswap command.

Second, in the process of diagnosing the problem I switched from
MODULES=dep in /etc/initramfs-tools/conf.d/driver-policy, which produces
a relatively small initial RAM file system image (< 2M) to MODULES=most,
which produces a huge "kitchen sink" initial RAM file system (~ 8M).
I thought perhaps some missing modules in the initial RAM file system
might be the problem.  It wasn't.  But in the process, I proved that
lilo had no trouble loading this huge kernel + initrd combination,
as long as the large-memory option was present in /etc/lilo.conf.
This proves that the allegations made by some that lilo can't handle
modern linux kernels because they are too big is a myth.  The real
problem is that lilo's boot loader installer was not getting run
during a kernel install or upgrade, due to changes made to the kernel
maintainer scripts.  I have since switched back to MODULES=dep and
I have re-run "update-initramfs -uk all" to rebuild the initial RAM
file systems.  At this size, even if large-memory is *not* present
in /etc/lilo.conf, lilo should have no trouble loading it.  I've
tried that too.

-- 
  .''`.     Stephen Powell    
 : :'  :
 `. `'`
   `-


Reply to: