Bug#310653: Partial failure (lilo failed): i386 (4 * PPro): Woody->Sarge upgrade
reassign 310653 lilo
retitle 310653 lilo fails on /dev/sda when /dev/sde is removable and removed.
severity 310653 important
quit
On Wed, May 25, 2005 at 01:18:30PM +1200, Ewen McNeill wrote:
> Package: upgrade-reports
>
> lilo failed to install the boot blocks (with the issue described below),
> but (a) the package didn't report it had failed to install, and (b)
> the upgrade proceeded on, so the error message was lost in the noise of
> all the other upgrade messages. (I only know it did get reported because
> I have a script output of the whole upgrade process, and after manually
> running lilo knew what error message to search for.)
This is bug #304260 "lilo ignores failures on upgrade, leaving the
system silently unbootable".
> - Were there any problems with the system after upgrading?
>
> lilo would not reinstall the boot records. A basic "lilo -v" boot
> run reported:
>
> -=- cut here -=-
> LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger
> Development beyond version 21 Copyright (C) 1999-2004 John Coffman
> Released 17-Nov-2004, and compiled at 22:18:56 on Mar 12 2005
> Debian GNU/Linux
>
> Reading boot sector from /dev/sda
> part_nowrite: read:: Input/output error
> ewen@digital:~$
> -=- cut here -=-
>
> On further investigation this turned out to be due to a read error on
> one or both of the removable disk SCSI devices (a magnetoptical and a
> zip drive), quite probably due to there not being a disk in the drive
> (the system is remote from where I was performing the upgrade, but I'm
> fairly certain there's no disk in at least one of those drives).
>
> Output from "lilo -v 5" included below, as well as a summarised copy
> of the /etc/lilo.conf.
>
> The relevant removable scsi disks are /dev/sde and /dev/sdf; these
> are in no way involved in the boot process (which only uses /dev/sda, an
> 18GB internal SCSI disk). I cannot actually determine why lilo was
> even trying to access those disks.
>
> To the best of my recollection I have run lilo before (under Woody)
> with both the MO and Zip drives attached and without media in one
> or both of them. I believe the issue is new with the version of Lilo
> in Debian Sarge.
>
> I eventually resolved the issue by telling the SCSI driver to remove
> the two scsi removable disk drives (echo "scsi scsi remove-single-device
> HOST CHANNEL ID LUN" >/proc/scsi/scsi), and then running lilo. Lilo
> then reported that it had succsesfully installed the boot records.
>
> However that level of advanced knowledge will be beyond most users
> (I only knew it was possible because of needing to rescan the SCSI
> devices to force detection of ieee1394 attached "scsi" devices, and
> "find" devices that were turned off during boot).
>
> It seems to me that lilo should access only the devices that its
> configuration file tells it to use, or at least not fail when
> attempts to other devices are unsuccessful.
>
> Presumably given the impact of updating the lilo package (and install
> sets, boot floppies, etc) it is too late to do anything about this for
> Debian Sarge's release. It may, however be worth documenting in the
> release notes as a possible problem. I assume that inserting media into
> the removable drives would also be an adequate work around; as noted
> above I'm not near the host to try that.
>
> Disks:
>
> /dev/sda to /dev/sdd are internal SCSI disks of various sizes;
> /dev/sde is a MO disk, /dev/sdf is a Zip disk
>
> Summarised dmesg output
>
> -=- cut here -=-
> ewen@digital:~$ dmesg | egrep 'disk sd|Vendor|Type'
> ....... : Delivery Type: 0
> Vendor: IBM Model: DDYS-T18350N Rev: S96H
> Type: Direct-Access ANSI SCSI revision: 03
> Vendor: IBM Model: DNES-318350Y Rev: SA30
> Type: Direct-Access ANSI SCSI revision: 03
> Vendor: DEC Model: RZ29B (C) DEC Rev: 0014
> Type: Direct-Access ANSI SCSI revision: 02
> Vendor: SEAGATE Model: ST31055W Rev: 0596
> Type: Direct-Access ANSI SCSI revision: 02
> Vendor: PLEXTOR Model: CD-R PX-R820T Rev: 1.03
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: DEC Model: DLT2000 Rev: 830A
> Type: Sequential-Access ANSI SCSI revision: 02
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
> Attached scsi disk sdc at scsi0, channel 0, id 12, lun 0
> Attached scsi disk sdd at scsi0, channel 0, id 13, lun 0
> Vendor: Maxoptix Model: T3-1304 Rev: 1.1d
> Type: Optical Device ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: NRC Model: MBR-7 Rev: 110
> Type: CD-ROM ANSI SCSI revision: 02
> Vendor: IOMEGA Model: ZIP 100 Rev: J.02
> Type: Direct-Access ANSI SCSI revision: 02
> Attached scsi removable disk sde at scsi2, channel 0, id 2, lun 0
> Attached scsi removable disk sdf at scsi2, channel 0, id 6, lun 0
> ewen@digital:~$
> -=- cut here -=-
>
> (Summarised) lilo configuration:
>
> -=- cut here -=-
> ewen@digital:~$ grep -v "^#" /etc/lilo.conf | grep '[a-z]'
> lba32
> disk=/dev/sda
> bios=0x80
> boot=/dev/sda
> root=/dev/sda1
> install=menu
> map=/boot/map
> delay=20
> vga=normal
> default=Linux
> image=/vmlinuz
> label=Linux
> read-only
> image=/vmlinuz.old
> label=LinuxOLD
> read-only
> optional
> ewen@digital:~$
> -=- cut here -=-
>
> Extra verbose lilo output:
>
> -=- cut here -=-
> ewen@digital:~$ sudo lilo -v 5
> LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger
> Development beyond version 21 Copyright (C) 1999-2004 John Coffman
> Released 17-Nov-2004, and compiled at 22:18:56 on Mar 12 2005
> Debian GNU/Linux
>
> raid_setup: dev=0801 rdev=0800
> raid_setup returns offset = 00000000 ndisk = 0
> BIOS VolumeID Device
> Reading boot sector from /dev/sda
> geo_get: device 0800, all=1
> pf_hard_disk_scan: (8,0) /dev/sda
> pf_hard_disk_scan: (8,1) /dev/sda1
> lookup_dev: number=0800
> lookup_dev: number=0800
> pf: dev=0800 id=00000000 name=/dev/sda
> geo_query_dev: device=0800
> lookup_dev: number=0800
> lookup_dev: number=0300
> exit geo_query_dev
> bios_dev: device 0800
> lookup_dev: number=0800
> bios_dev: masked device 0800, which is /dev/sda
> bios_dev: geometry check found 0 matches
> bios_dev: (0x83) vol-ID=00000000 *PT=08078E9C
> bios_dev: (0x82) vol-ID=00000000 *PT=08078E54
> bios_dev: (0x81) vol-ID=00000000 *PT=08078E0C
> bios_dev: (0x80) vol-ID=00000000 *PT=08078DC4
> bios_dev: PT match found 1 match (0x80)
> pf_hard_disk_scan: (8,2) /dev/sda2
> pf_hard_disk_scan: (8,5) /dev/sda5
> pf_hard_disk_scan: (8,6) /dev/sda6
> pf_hard_disk_scan: (8,7) /dev/sda7
> pf_hard_disk_scan: (8,16) /dev/sdb
> pf_hard_disk_scan: (8,20) /dev/sdb4
> lookup_dev: number=0810
> lookup_dev: number=0810
> pf: dev=0810 id=00000000 name=/dev/sdb
> geo_query_dev: device=0810
> lookup_dev: number=0810
> exit geo_query_dev
> bios_dev: device 0810
> lookup_dev: number=0810
> bios_dev: masked device 0810, which is /dev/sdb
> bios_dev: geometry check found 0 matches
> bios_dev: (0x83) vol-ID=00000000 *PT=08078E9C
> bios_dev: (0x82) vol-ID=00000000 *PT=08078E54
> bios_dev: (0x81) vol-ID=00000000 *PT=08078E0C
> bios_dev: (0x80) vol-ID=00000000 *PT=08078DC4
> bios_dev: PT match found 1 match (0x81)
> pf_hard_disk_scan: (8,21) /dev/sdb5
> pf_hard_disk_scan: (8,32) /dev/sdc
> pf_hard_disk_scan: (8,33) /dev/sdc1
> lookup_dev: number=0820
> lookup_dev: number=0820
> pf: dev=0820 id=00000000 name=/dev/sdc
> geo_query_dev: device=0820
> lookup_dev: number=0820
> exit geo_query_dev
> bios_dev: device 0820
> bios_dev: match on geometry alone (0x82)
> pf_hard_disk_scan: (8,34) /dev/sdc2
> pf_hard_disk_scan: (8,37) /dev/sdc5
> pf_hard_disk_scan: (8,38) /dev/sdc6
> pf_hard_disk_scan: (8,48) /dev/sdd
> pf_hard_disk_scan: (8,49) /dev/sdd1
> lookup_dev: number=0830
> lookup_dev: number=0830
> pf: dev=0830 id=00000000 name=/dev/sdd
> geo_query_dev: device=0830
> lookup_dev: number=0830
> exit geo_query_dev
> bios_dev: device 0830
> bios_dev: match on geometry alone (0x83)
> pf_hard_disk_scan: (8,53) /dev/sdd5
> pf_hard_disk_scan: (8,64) /dev/sde
> pf_hard_disk_scan: (8,65) /dev/sde1
> lookup_dev: number=0840
> lookup_dev: number=0840
> pf: dev=0840 id=6B736964 name=/dev/sde
> geo_query_dev: device=0840
> lookup_dev: number=0840
> exit geo_query_dev
> bios_dev: device 0840
> lookup_dev: number=0840
> bios_dev: masked device 0840, which is /dev/sde
> bios_dev: geometry check found 0 matches
> bios_dev: (0x83) vol-ID=00000000 *PT=08078E9C
> bios_dev: (0x82) vol-ID=00000000 *PT=08078E54
> bios_dev: (0x81) vol-ID=00000000 *PT=08078E0C
> bios_dev: (0x80) vol-ID=00000000 *PT=08078DC4
> bios_dev: PT match found 0 matches (0xFF)
> bios_dev: S/N match found 0 matches (0xFFFFFFFF)
> part_nowrite: read:: Input/output error
> -=- cut here -=-
>
>
> Further Comments/Problems:
>
> The host is a quad processor, pentium pro 200, system; with 512MB of RAM.
> Originally it was a Digital Prioris, but a number of parts (SCSI disks,
> SCSI controllers, etc) have been changed since the original stock system.
>
> I haven't yet rebooted this system so I'm not certain that the
> lilo boot blocks install worked, even after removing the problematic
> SCSI devices. For obvious reasons I want to wait until I am beside
> the host before rebooting it given apparent boot block install issues.
>
> If desired I can advise if the reboot was successful later this week.
This second problem look like a general lilo problem, not specific to the
upgrade, so I reassign it to lilo.
Thanks a lot for your detailed bug report,
--
Bill. <ballombe@debian.org>
Imagine a large red swirl here.
Reply to: