[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

qla2xxx mailbox timeout crashes lenny



When running rsnapshot backups from an IBM fibre channel disk system using LVM2 snapshots to a Promise fibre channel disk system, the qla2xxx driver causes a system crash and reboot. I'm running Lenny with kernel 2.6.22--3-vserver-amd64 and stock Debian qla2xxx module. I've already replaced the Qlogic HBA and the Qlogic switch connecting to the storage. Three other servers with similar hardware running the same Debian version don't have this problem.

Feb  6 13:40:28 hqhost kernel: qla2xxx_eh_abort(0): aborting sp ffff8101d01aa7c0 from RISC. pid=111928.
Feb  6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling abort_isp
Feb  6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling abort_isp
Feb  6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout occured. Issuing ISP abort.
Feb  6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Performing ISP error recovery - ha= ffff810225a84530.
Feb  6 13:40:58 hqhost kernel: scsi(0): **** Load RISC code ****
Feb  6 13:40:58 hqhost kernel: scsi(0): Verifying Checksum of loaded RISC code.
Feb  6 13:40:58 hqhost kernel: scsi(0): Checksum OK, start firmware.
Feb  6 13:40:58 hqhost kernel: scsi(0): Issue init firmware.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous P2P MODE received.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous LOOP UP (4 Gbps).
Feb  6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (4 Gbps).
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE.
Feb  6 13:40:59 hqhost kernel: scsi(0): Port database changed ffff 0006 0000.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0004/0600.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0007/0b00.
Feb  6 13:40:59 hqhost kernel: scsi(0): F/W Ready - OK
Feb  6 13:40:59 hqhost kernel: scsi(0): fw_state=3 curr time=1001756ca.
Feb  6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Start configure loop, status = 0
Feb  6 13:40:59 hqhost kernel: scsi(0): Configure loop -- dpc flags =0x4080048
Feb  6 13:40:59 hqhost kernel: scsi(0): RSCN queue entry[0] = [00/000000].
Feb  6 13:40:59 hqhost kernel: scsi(0): device_resync: rscn overflow.
Feb  6 13:40:59 hqhost kernel: scsi(0): RFT_ID failed, completion status (280).
Feb  6 13:40:59 hqhost kernel: scsi(0): Register FC-4 TYPE failed.
Feb  6 13:40:59 hqhost kernel: scsi(0): RFF_ID failed, completion status (280).
Feb  6 13:40:59 hqhost kernel: scsi(0): Register FC-4 Features failed.
Feb  6 13:40:59 hqhost kernel: scsi(0): RNN_ID failed, completion status (280).
Feb  6 13:40:59 hqhost kernel: scsi(0): Register Node Name failed.
Feb  6 13:40:59 hqhost kernel: scsi(0): GID_PT failed, completion status (180).
Feb  6 13:40:59 hqhost kernel: scsi(0): GA_NXT failed, rejected request:
Feb  6 13:40:59 hqhost kernel:  0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
Feb  6 13:40:59 hqhost kernel: ------------------------------
--------------------------------
Feb  6 13:40:59 hqhost kernel: 14  00  00  00  00  10  97  23  02  00  00  00  10  08  00  00
Feb  6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: SNS scan failed -- assuming zero-entry result...
Feb  6 13:40:59 hqhost kernel: scsi(0): fcport-0 - port retry count: 29 remaining
Feb  6 13:40:59 hqhost kernel: scsi(0): fcport-1 - port retry count: 29 remaining
Feb  6 13:40:59 hqhost kernel: scsi(0): fcport-2 - port retry count: 29 remaining
Feb  6 13:40:59 hqhost kernel: qla24xx_fabric_logout(0): failed to complete IOCB -- completion status (2)  ioparam=0/810031.
Feb  6 13:40:59 hqhost kernel: scsi(0): LOOP READY
Feb  6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Configure loop done, status = 0x0
Feb  6 13:40:59 hqhost kernel: APIC error on CPU5: 00(40)
Feb  6 13:40:59 hqhost kernel: qla2x00_abort_isp(0): exiting.
Feb  6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished abort_isp
Feb  6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished abort_isp
Feb  6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): **** FAILED. mbx0=54, mbx1=0, mbx2=2397, cmd=54 ****
Feb  6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100
Feb  6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100
Feb  6 13:40:59 hqhost kernel: qla24xx_abort_command(0): failed to issue IOCB (100).
Feb  6 13:40:59 hqhost kernel: qla2xxx_eh_abort(0): abort_command mbx failed.
Feb  6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): Abort command issued -- 0 1b538 2002.
Feb  6 13:41:00 hqhost kernel: scsi(0): fcport-0 - port retry count: 28 remaining
Feb  6 13:41:00 hqhost kernel: scsi(0): fcport-1 - port retry count: 28 remaining
Feb  6 13:41:00 hqhost kernel: scsi(0): fcport-2 - port retry count: 28 remaining
Feb  6 13:41:01 hqhost kernel: scsi(0): fcport-0 - port retry count: 27 remaining
Feb  6 13:41:01 hqhost kernel: scsi(0): fcport-1 - port retry count: 27 remaining
Feb  6 13:41:01 hqhost kernel: scsi(0): fcport-2 - port retry count: 27 remaining
Feb  6 13:41:02 hqhost kernel: scsi(0): fcport-0 - port retry count: 26 remaining
Feb  6 13:41:02 hqhost kernel: scsi(0): fcport-1 - port retry count: 26 remaining
Feb  6 13:41:02 hqhost kernel: scsi(0): fcport-2 - port retry count: 26 remaining
...(25 more port retries)...
Feb  6 13:41:33 hqhost kernel:  rport-0:0-0: blocked FC remote port time out: removing target and saving binding
Feb  6 13:41:33 hqhost kernel:  rport-0:0-4: blocked FC remote port time out: removing target and saving binding
Feb  6 13:41:33 hqhost kernel:  rport-0:0-5: blocked FC remote port time out: removing target and saving binding
Feb  6 13:41:33 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): DEVICE RESET ISSUED.
Feb  6 13:41:33 hqhost kernel: qla2x00_wait_for_hba_online return_status=0


Is this a hardware problem, a kernel problem, or a qlogic driver problem-- or perhaps all three at once? Thanks in advance,
--
Daniel Bakken
Systems Administrator

Economic Modeling Specialists Inc
1187 Alturas Drive
Moscow, Idaho 83843
(208) 883-3500 x1016 - office
(208) 596-1446 - cell
Reply to: