[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#310653: Partial failure (lilo failed): i386 (4 * PPro): Woody->Sarge upgrade



reassign 310653 lilo
retitle 310653 lilo fails on /dev/sda when /dev/sde is removable and removed. 
severity 310653 important
quit
On Wed, May 25, 2005 at 01:18:30PM +1200, Ewen McNeill wrote:
> Package: upgrade-reports
> 
> lilo failed to install the boot blocks (with the issue described below),
> but (a) the package didn't report it had failed to install, and (b)
> the upgrade proceeded on, so the error message was lost in the noise of
> all the other upgrade messages.  (I only know it did get reported because
> I have a script output of the whole upgrade process, and after manually
> running lilo knew what error message to search for.)

This is bug #304260 "lilo ignores failures on upgrade, leaving the
system silently unbootable".

> - Were there any problems with the system after upgrading?
> 
> lilo would not reinstall the boot records.  A basic "lilo -v" boot
> run reported:
> 
> -=- cut here -=-
> LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger
> Development beyond version 21 Copyright (C) 1999-2004 John Coffman
> Released 17-Nov-2004, and compiled at 22:18:56 on Mar 12 2005
> Debian GNU/Linux
> 
> Reading boot sector from /dev/sda
> part_nowrite: read:: Input/output error
> ewen@digital:~$
> -=- cut here -=-
> 
> On further investigation this turned out to be due to a read error on 
> one or both of the removable disk SCSI devices (a magnetoptical and a 
> zip drive), quite probably due to there not being a disk in the drive
> (the system is remote from where I was performing the upgrade, but I'm
> fairly certain there's no disk in at least one of those drives).
> 
> Output from "lilo -v 5" included below, as well as a summarised copy
> of the /etc/lilo.conf.
> 
> The relevant removable scsi disks are /dev/sde and /dev/sdf; these
> are in no way involved in the boot process (which only uses /dev/sda, an
> 18GB internal SCSI disk).  I cannot actually determine why lilo was
> even trying to access those disks.
> 
> To the best of my recollection I have run lilo before (under Woody) 
> with both the MO and Zip drives attached and without media in one
> or both of them.  I believe the issue is new with the version of Lilo
> in Debian Sarge.
> 
> I eventually resolved the issue by telling the SCSI driver to remove
> the two scsi removable disk drives (echo "scsi scsi remove-single-device
> HOST CHANNEL ID LUN" >/proc/scsi/scsi), and then running lilo.  Lilo
> then reported that it had succsesfully installed the boot records.
> 
> However that level of advanced knowledge will be beyond most users
> (I only knew it was possible because of needing to rescan the SCSI 
> devices to force detection of ieee1394 attached "scsi" devices, and
> "find" devices that were turned off during boot).
> 
> It seems to me that lilo should access only the devices that its
> configuration file tells it to use, or at least not fail when
> attempts to other devices are unsuccessful.
> 
> Presumably given the impact of updating the lilo package (and install
> sets, boot floppies, etc) it is too late to do anything about this for
> Debian Sarge's release.  It may, however be worth documenting in the
> release notes as a possible problem.  I assume that inserting media into
> the removable drives would also be an adequate work around; as noted
> above I'm not near the host to try that.
> 
> Disks:
> 
> /dev/sda to /dev/sdd are internal SCSI disks of various sizes; 
> /dev/sde is a MO disk, /dev/sdf is a Zip disk
> 
> Summarised dmesg output
> 
> -=- cut here -=-
> ewen@digital:~$ dmesg | egrep 'disk sd|Vendor|Type'
> .......    : Delivery Type: 0
>   Vendor: IBM       Model: DDYS-T18350N      Rev: S96H
>   Type:   Direct-Access                      ANSI SCSI revision: 03
>   Vendor: IBM       Model: DNES-318350Y      Rev: SA30
>   Type:   Direct-Access                      ANSI SCSI revision: 03
>   Vendor: DEC       Model: RZ29B    (C) DEC  Rev: 0014
>   Type:   Direct-Access                      ANSI SCSI revision: 02
>   Vendor: SEAGATE   Model: ST31055W          Rev: 0596
>   Type:   Direct-Access                      ANSI SCSI revision: 02
>   Vendor: PLEXTOR   Model: CD-R   PX-R820T   Rev: 1.03
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: DEC       Model: DLT2000           Rev: 830A
>   Type:   Sequential-Access                  ANSI SCSI revision: 02
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
> Attached scsi disk sdc at scsi0, channel 0, id 12, lun 0
> Attached scsi disk sdd at scsi0, channel 0, id 13, lun 0
>   Vendor: Maxoptix  Model: T3-1304           Rev: 1.1d
>   Type:   Optical Device                     ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: NRC       Model: MBR-7             Rev: 110 
>   Type:   CD-ROM                             ANSI SCSI revision: 02
>   Vendor: IOMEGA    Model: ZIP 100           Rev: J.02
>   Type:   Direct-Access                      ANSI SCSI revision: 02
> Attached scsi removable disk sde at scsi2, channel 0, id 2, lun 0
> Attached scsi removable disk sdf at scsi2, channel 0, id 6, lun 0
> ewen@digital:~$ 
> -=- cut here -=-
> 
> (Summarised) lilo configuration:
> 
> -=- cut here -=-
> ewen@digital:~$ grep -v "^#" /etc/lilo.conf | grep '[a-z]'
> lba32
> disk=/dev/sda
>   bios=0x80
> boot=/dev/sda
> root=/dev/sda1
> install=menu
> map=/boot/map
> delay=20
> vga=normal
> default=Linux
> image=/vmlinuz
>         label=Linux
>         read-only
> image=/vmlinuz.old
>         label=LinuxOLD
>         read-only
>         optional
> ewen@digital:~$
> -=- cut here -=-
> 
> Extra verbose lilo output:
> 
> -=- cut here -=-
> ewen@digital:~$ sudo lilo -v 5
> LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger
> Development beyond version 21 Copyright (C) 1999-2004 John Coffman
> Released 17-Nov-2004, and compiled at 22:18:56 on Mar 12 2005
> Debian GNU/Linux
> 
> raid_setup: dev=0801  rdev=0800
> raid_setup returns offset = 00000000  ndisk = 0
>  BIOS   VolumeID   Device
> Reading boot sector from /dev/sda
> geo_get: device 0800, all=1
> pf_hard_disk_scan: (8,0) /dev/sda
> pf_hard_disk_scan: (8,1) /dev/sda1
> lookup_dev:  number=0800
> lookup_dev:  number=0800
> pf:  dev=0800  id=00000000  name=/dev/sda
> geo_query_dev: device=0800
> lookup_dev:  number=0800
> lookup_dev:  number=0300
> exit geo_query_dev
> bios_dev:  device 0800
> lookup_dev:  number=0800
> bios_dev:  masked device 0800, which is /dev/sda
> bios_dev: geometry check found 0 matches
> bios_dev: (0x83)  vol-ID=00000000  *PT=08078E9C
> bios_dev: (0x82)  vol-ID=00000000  *PT=08078E54
> bios_dev: (0x81)  vol-ID=00000000  *PT=08078E0C
> bios_dev: (0x80)  vol-ID=00000000  *PT=08078DC4
> bios_dev: PT match found 1 match (0x80)
> pf_hard_disk_scan: (8,2) /dev/sda2
> pf_hard_disk_scan: (8,5) /dev/sda5
> pf_hard_disk_scan: (8,6) /dev/sda6
> pf_hard_disk_scan: (8,7) /dev/sda7
> pf_hard_disk_scan: (8,16) /dev/sdb
> pf_hard_disk_scan: (8,20) /dev/sdb4
> lookup_dev:  number=0810
> lookup_dev:  number=0810
> pf:  dev=0810  id=00000000  name=/dev/sdb
> geo_query_dev: device=0810
> lookup_dev:  number=0810
> exit geo_query_dev
> bios_dev:  device 0810
> lookup_dev:  number=0810
> bios_dev:  masked device 0810, which is /dev/sdb
> bios_dev: geometry check found 0 matches
> bios_dev: (0x83)  vol-ID=00000000  *PT=08078E9C
> bios_dev: (0x82)  vol-ID=00000000  *PT=08078E54
> bios_dev: (0x81)  vol-ID=00000000  *PT=08078E0C
> bios_dev: (0x80)  vol-ID=00000000  *PT=08078DC4
> bios_dev: PT match found 1 match (0x81)
> pf_hard_disk_scan: (8,21) /dev/sdb5
> pf_hard_disk_scan: (8,32) /dev/sdc
> pf_hard_disk_scan: (8,33) /dev/sdc1
> lookup_dev:  number=0820
> lookup_dev:  number=0820
> pf:  dev=0820  id=00000000  name=/dev/sdc
> geo_query_dev: device=0820
> lookup_dev:  number=0820
> exit geo_query_dev
> bios_dev:  device 0820
> bios_dev: match on geometry alone (0x82)
> pf_hard_disk_scan: (8,34) /dev/sdc2
> pf_hard_disk_scan: (8,37) /dev/sdc5
> pf_hard_disk_scan: (8,38) /dev/sdc6
> pf_hard_disk_scan: (8,48) /dev/sdd
> pf_hard_disk_scan: (8,49) /dev/sdd1
> lookup_dev:  number=0830
> lookup_dev:  number=0830
> pf:  dev=0830  id=00000000  name=/dev/sdd
> geo_query_dev: device=0830
> lookup_dev:  number=0830
> exit geo_query_dev
> bios_dev:  device 0830
> bios_dev: match on geometry alone (0x83)
> pf_hard_disk_scan: (8,53) /dev/sdd5
> pf_hard_disk_scan: (8,64) /dev/sde
> pf_hard_disk_scan: (8,65) /dev/sde1
> lookup_dev:  number=0840
> lookup_dev:  number=0840
> pf:  dev=0840  id=6B736964  name=/dev/sde
> geo_query_dev: device=0840
> lookup_dev:  number=0840
> exit geo_query_dev
> bios_dev:  device 0840
> lookup_dev:  number=0840
> bios_dev:  masked device 0840, which is /dev/sde
> bios_dev: geometry check found 0 matches
> bios_dev: (0x83)  vol-ID=00000000  *PT=08078E9C
> bios_dev: (0x82)  vol-ID=00000000  *PT=08078E54
> bios_dev: (0x81)  vol-ID=00000000  *PT=08078E0C
> bios_dev: (0x80)  vol-ID=00000000  *PT=08078DC4
> bios_dev: PT match found 0 matches (0xFF)
> bios_dev: S/N match found 0 matches (0xFFFFFFFF)
> part_nowrite: read:: Input/output error
> -=- cut here -=-
> 
> 
> Further Comments/Problems:
> 
> The host is a quad processor, pentium pro 200, system; with 512MB of RAM.
> Originally it was a Digital Prioris, but a number of parts (SCSI disks,
> SCSI controllers, etc) have been changed since the original stock system.
> 
> I haven't yet rebooted this system so I'm not certain that the 
> lilo boot blocks install worked, even after removing the problematic
> SCSI devices.  For obvious reasons I want to wait until I am beside
> the host before rebooting it given apparent boot block install issues.
> 
> If desired I can advise if the reboot was successful later this week.

This second problem look like a general lilo problem, not specific to the
upgrade, so I reassign it to lilo.

Thanks a lot for your detailed bug report,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.



Reply to: