[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Serious SCSI problem on Lenny networkinstall image



Hi list,

When booting Debian 5 network install image, it hangs on the "detecting hardware to find cdrom drive" message.
The console (Alt-F4) gives the messages in the attachment, from which I can tell the SCSI subsystem is not doing well.

So here we go:

First machine is my (very) old dual-P333 with Adaptec 29160 SCSI Ultra160 controller in a 33 MHz PCI slot.
The SCSI controller is a bit much for this motherboard, but it has always worked reliably.

The problems on this machine seam to start from the the following loglines:

  Mar  9 02:19:17 kernel: [   28.548113] scsi 0:0:0:0: Attempting to queue an ABORT message
  Mar  9 02:19:17 kernel: [   28.548148] CDB: 0x12 0x0 0x0 0x0 0x24 0x0
  Mar  9 02:19:17 kernel: [   28.548218] scsi 0:0:0:0: Command already completed
  Mar  9 02:19:17 kernel: [   28.548235] aic7xxx_abort returns 0x2002
and 10 seconds later:
  Mar  9 02:19:27 kernel: [   38.548098] scsi 0:0:0:0: Attempting to queue an ABORT message
  Mar  9 02:19:27 kernel: [   38.548132] CDB: 0x0 0x0 0x0 0x0 0x0 0x0
  Mar  9 02:19:27 kernel: [   38.548216] scsi0: At time of recovery, card was paused
  Mar  9 02:19:27 kernel: [   38.548244] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
  Mar  9 02:19:27 kernel: [   38.548255] scsi0: Dumping Card State in Message-in phase, at SEQADDR 0x42
  Mar  9 02:19:27 kernel: [   38.548271] Card was paused
  Mar  9 02:19:27 kernel: [   38.548288] ACCUM = 0x0, SINDEX = 0x71, DINDEX = 0xe4, ARG_2 = 0x0
  Mar  9 02:19:27 kernel: [   38.548304] HCNT = 0x0 SCBPTR = 0x0
  Mar  9 02:19:27 kernel: [   38.548318] SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0]
etcetera, etcetera

The whole syslog: syslog.pii333 in the attachment

Second machine is a dual-XEON 2200 workstation, Celsius 670 by Fujitsu-Siemens, the most stable machine I ever had.
It has an internal Qlogic 1216x SCSI Ultra160 controller. Here I got the logfile via a USB stick.

The problems here:
  Mar  9 02:23:15 kernel: [   54.432009] scsi(0): mailbox timed out, mailbox0 8020, ictrl 0006, istatus 6000
  Mar  9 02:23:15 kernel: [   54.432025] qla1280_mailbox_command: Command failed, mailbox0 = 0x0015, mailbox_out0 = 0x8020,
    status = 0x6000
  Mar  9 02:23:15 kernel: [   54.432033] m0 8020, m1 0000, m2 0000, m3 1809
  Mar  9 02:23:15 kernel: [   54.432038] m4 0034, m5 0021, m6 0201, m7 1000
  Mar  9 02:23:15 kernel: [   54.432042] scsi(0:0:0:0): Unable to abort command!
  Mar  9 02:23:15 kernel: [   54.432055] scsi(0): Resetting Cmnd=0xf78b8d80, Handle=0x00000001, action=0x2
  Mar  9 02:23:15 kernel: [   54.432060] scsi(0:0:0:0): Queueing device reset command.

The whole syslog: syslog.xeon in the attachment

Before you guys go ask the whole standard "chain" of questions about my SCSI setup, it might be nice to say that a debootstrap install of Debian 5 (via Ubuntu livecd) + apt-get update/upgrade produced a running system on both machines, without problems.
So the problem might be fixed there.

Both machines have been running Ubuntu server 6.06 for quite some time, and stable they were.

So it seems to be not hardware related.

I guess there must be something in the network install image that is not as it should be, or different.
Also I did not try a "normal" boot image.

From my development experience I would say there's a layer above the scsi device subsystem that has a bug.

Regards,

Jacco

Attachment: debian5syslog.tar.gz
Description: GNU Zip compressed data


Reply to: