Re: LILO bug?
Alvin Oga wrote:
On Fri, 1 Jul 2005, Marty wrote:
> - how do you "know" that it is doing the right thing or not??
I verified the LILO update to the wrong (SCSI) disk, beyond just
observing that it had been rendered unbootable, if that's what you mean.
how did you "verify" that lilo wrote to the wrong disk
vs the scsi disk already having the mbr from prev installs
In addition to the SCSI disk becoming unbootable -- the SCSI drive LED
flashed when LILO wrote to it; and pulling the SCSI cable (not recommended)
caused the LILO installation to fail attempting to write to the disk, and to
return an error to that effect.
As I mentioned before, the target partition /dev/hda1 subsequently failed to
boot, and removing the SCSI disk from the system bypassed the entire problem.
I just realized I left out an important detail. At the time I ran LILO
the SCSI disk was the boot drive, and the root partition was /dev/hda1 as
specified in the root parameter on the LILO boot prompt. This is how I
could get away with unplugging the SCSI drive, in case you wondered. :-)
I did try booting with a rescue floppy and with the SCSI drive still in the
system but that did not solve the problem.
how do you know the "lilo stuff" you see on the screen is coming
from hda vs sda
I don't know offhand how to map the BIOS disk numbers (0x80, 0x81, ...)
to the Linux disk numbers (0x300, 0x301, ...). That's why I performed
the other tests to verify the actual write to the SCSI disk.
- did you make the look-n-feel different on the different disks
so you know that the mbr you see on the lilo boot screen is
the one corresponding to each different disks
As I pointed out before, I don't get as far as booting. The target drive
hda still doesn't boot and the SCSI drive stops booting.
- label theboot kernel differently, explicitly for each
- that is a trivial way to see which MBR is being booted
- it's NOT possible, or extreme slim chance, lilo writes to the wrong
The impossible seems to be happening. I'd entertain alternate theories.
> - did you delete the mbr info on /dev/hda or /dev/hda1 or /dev/hda2 ...
> on each partition you are trying to test
I'm not trying to test any partitions, and deleting MBRs generally seems
like a bad idea to propose as a debugging procedure.
than you cannot claim there is abug if you cannot explicitly confirm it
and duplicate the error
I've seen it many times on different systems, hence the workaround. I don't know
if others can duplicate it or not, hence one of the reasons for posting.
> - did you look at the contents of the mbr BEFORE and AFTER you
> ran lilo
No, but I think the write to the wrong disk takes precedence over
specifics about what data gets written and where.
not possible ... more likely that there;s something you forgot or
overlooked ... lilo does NOT write tothe wrong disk ... it does
NOT overwrite anything
- which version of lilo ...
- did you go to the lilo site to download the latest version
that was previously posted ...
The last time I hit the problem was a few months ago. It's possible that
a recent version has fixed the problem, but a quick scan of the changelogs
doesn't seem to indicate that.
> - did you change the bios boot order
Again, I tried using disk/bios stanzas without success.
bios boot order has NOTHING to do with "disk stanza" (presumably in lilo)
I changed the boot drive in the BIOS, if that's what you mean. That's how
I discovered the problem.