[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SCSI host adapter "gone" after kernel upgrade



This is a rather long description of a problem, with two questions at the
bottom.

Hello all. I have stumbled upon a problem I do not quite understand. A
while back I upgraded a server to an Asus P4C800 Deluxe mainboard with all
the goodies (P4 with hyperthreading and fast RAM). It then failed to
recognize the old boot partition on my SCSI hard drive. After many woes, I
gave up and duplicated the boot partition to an IDE drive, which booted
the system fine and let me access the regular SCSI drives. 
The server has since had problems with spurious hangs and kernel oops,
showing no pattern or common application causing them. I flashed the BIOS
with Asus' latest revision (v1010). I then proceeded to build a new kernel
(2.4.21) with SMP disabled to see if the hyperthreading caused
problems. Disabling SMP was the only configuration difference to the old
kernel (2.4.20). I disabled the HT in the BIOS and tried to boot the new
kernel. It then failed to recognize my SCSI host adapter, or rather,
stated that it tried to use IRQ 0:

<from dmesg output>
PCI: Enabling device 02:0a.0 (0000 -> 0003)
PCI: No IRQ known for interrupt pin A of device 02:0a.0. Please try using
pci=biosirq.
sym.2.10.0: setting PCI_COMMAND_MASTER PCI_COMMAND_PARITY...
sym0: <895> rev 0x1 on pci bus 2 device 10 function 0 irq 0
sym0: Tekram NVRAM, ID 7, Fast-40, SE, parity checking
sym0: request irq 0 failure
sym0: giving up ...
<end of snip>

Naturally, the rest went bad, since it couldn't find any drives.
The old kernel, with SMP, has a much longer dmesg output with SMP
and APIC messages. There, the host adapter gets IRQ 22:

<from dmesg output>
PCI: Enabling device 02:0a.0 (0000 -> 0003)
sym.2.10.0: setting PCI_COMMAND_MASTER PCI_COMMAND_PARITY...
sym0: <895> rev 0x1 on pci bus 2 device 10 function 0 irq 22
sym0: Tekram NVRAM, ID 7, Fast-40, SE, parity checking
<end of snip>

This makes sense when looking further up in the dmesg:
PCI->APIC IRQ transform: (B2,I10,P0) -> 22

The line above (or, in fact, any APIC references) is not present when
trying to boot single-CPU. The machine worked fine before I even got the
darn P4, so I can't see that it should depend on APIC. The major glitch
now (which I noticed just now, I do not know for how long it has been this
way) is that the SCSI BIOS setup screen doesn't appear anymore, so I can't
see what IRQ the adapter thinks it should be using. 
The obvious conclusion is that the BIOS upgrade has changed some option
that relinquishes power to the SCSI BIOS at some point. I looked
through the various options, but could not see what that would be.
The machine is working now, with the old SMP kernel and HT enabled, but I
would like this resolved so I can attempt kernel upgrades in the future,
and I can't take it down for any extended time since this is a production
server.

The questions I have are:

 - Is APIC exclusively an SMP feature? If not, why don't I get any APIC
messages when booting single-CPU?
 - Does anyone know what BIOS option could preempt the SCSI BIOS screen?
 - Why does the adapter accept IRQ 22 with APIC, and attempt IRQ 0
without, when it had some self-decided IRQ before the P4?

Thanks for your help.

-- 
Erik Rask, systems administrator @ AB Strakt
"There is no normal life. There's just life. So get on with it."



Reply to: