[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Problems with esp SCSI driver



Hi,
   I'm having problems with the esp SCSI driver on a Sun E4000.  I'm
running a patched version of 2.6.11.9 with two disks and a CDROM attaced
to esp0 and 7 disks attached to esp1.  I'm running software RAID 5 on
the 7 disk set and under high load (the first time when trying to use
debootstrap on the RAID and the second time when rebuilding) it seems to
work fine for about 20 minutes and then:

[ partial log as machine is remote and problem leaves the system largely
unusable and unable to reboot ]

esp1: Aborting command
esp1: dumping state
esp1: dma -- cond_reg<b2bf8b14> addr<f1300000>
esp1: SW [sreg<11> sstep<04> ireg<10>]
esp1: HW reread [sreg<11> sstep<cc> ireg<00>]
esp1: current command [tgt<0a> lun<00> pphase<CLUELESS> cphase<DATAIN>]
esp1: disconnected [tgt<09> lun<00> pphase<FREEING> cphase<FREEING>]
esp1: Aborting command
esp1: dumping state   
esp1: dma -- cond_reg<b2bf8b14> addr<f1300000>
esp1: SW [sreg<11> sstep<04> ireg<10>]
esp1: HW reread [sreg<11> sstep<cc> ireg<00>]
esp1: current command [tgt<0a> lun<00> pphase<UNISSUED>
cphase<UNISSUED>]
esp1: disconnected [tgt<09> lun<00> pphase<FREEING> cphase<FREEING>]

<... message repeated a number of times ...>

esp1: Resetting scsi bus
esp1: Gross error sreg=40
esp1: SCSI bus reset interrupt
esp1: DMA error b2bf8a03
esp1: Resetting scsi bus
esp1: SCSI bus reset interrupt
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 10 lun 0
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 9 lun 0 
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 8 lun 0 
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 14 lun 0
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 13 lun 0
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 12 lun 0
scsi: Device offlined - not ready after error recovery: host 1 channel 0
id 11 lun 0
SCSI error : <1 0 10 0> return code = 0x2
end_request: I/O error, dev sde, sector 5335680
raid5: Disk failure on sde1, disabling device. Operation continuing on 5
devices
scsi1 (10:0): rejecting I/O to offline device

the RAID then promptly fails each disk in turn and renders /dev/md0
unusable.  After this has happened, none of the disks in the set can be
accessed (dd if=/dev/sdi1 of=/dev/null) gives and error saying the
device cannot be opened.  Each time it has failed on accessing a
different disk and I believe the disks are OK.  The machine fails to
reboot cleanly and requires a hardware power cycle.

Can anyone suggest anything?  Searching the web only gives a few vauge
suggestions to check the hardware and termination.

Cheers,
 - Martin

-- 
Martin
inkubus@interalpha.co.uk
"Seasons change, things come to pass"



Reply to: