Freeze when using SCSI drive
I have a file server with a 73 GB SCSI drive running off an Adaptec
AIC-7892A controller (MSI mainboard, Duron processor, Debian 2.4.20-k7
stock kernel, aic7xxx driver module).
Occasionally, the file server looses contact with the drive. It's not
the root drive, so the machine doesn't freeze, but any processes that
accesses it hangs in D mode ("uninterruptable I/O") and can't be killed.
The file server ceases to be useful as such, necessitating a hard reboot.
The messages from the kernel log appear below. Basically, the driver
keeps trying to "queue an ABORT message". It prints a bunch of stuff,
ten says "device is disconnected, re-queuing SCB" and tries again. It
never suceeds.
Lately, this has happened more and more often. Perhaps this is because
some piece of hardware is failing. What should I suspect first: the
controller, the bus, or the drive? Also, the drive is in one of those
removable (not hot-swapable) trays, so that could fail, too. On the
other hand, this behavior happens pretty consistantly when I do backup,
so perhaps it just triggered by heavy, sustained use.
Is there a known driver bug? A known hardware bug with this controller?
Can someone intrepret the log messages below? Is there a better list to
go to with this data? Any hints are appreciated.
Thanks,
David Wright
---------------------------------------------------------
Apr 28 13:38:30 billie kernel: scsi0:0:1:0: Attempting to queue an ABORT
message
Apr 28 13:38:30 billie kernel: scsi0: Dumping Card State while idle, at
SEQADDR 0x168
Apr 28 13:38:30 billie kernel: ACCUM = 0x7, SINDEX = 0x64, DINDEX =
0x65, ARG_2 = 0x0
Apr 28 13:38:30 billie kernel: HCNT = 0x0 SCBPTR = 0x4
Apr 28 13:38:30 billie kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Apr 28 13:38:30 billie kernel: DFCNTRL = 0x0, DFSTATUS = 0x89
Apr 28 13:38:30 billie kernel: LASTPHASE = 0x1, SCSISIGI = 0xe4,
SXFRCTL0 = 0x88
Apr 28 13:38:30 billie kernel: SSTAT0 = 0x0, SSTAT1 = 0x0
Apr 28 13:38:30 billie kernel: SCSIPHASE = 0x0
Apr 28 13:38:30 billie kernel: STACK == 0x175, 0x160, 0x0, 0x34
Apr 28 13:38:30 billie kernel: SCB count = 12
Apr 28 13:38:30 billie kernel: Kernel NEXTQSCB = 1
Apr 28 13:38:30 billie kernel: Card NEXTQSCB = 1
Apr 28 13:38:30 billie kernel: QINFIFO entries:
Apr 28 13:38:30 billie kernel: Waiting Queue entries:
Apr 28 13:38:30 billie kernel: Disconnected Queue entries: 4:3 3:6 2:4
0:5 7:0 6:11 5:2 1:7
Apr 28 13:38:30 billie kernel: QOUTFIFO entries:
Apr 28 13:38:30 billie kernel: Sequencer Free SCB List: 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Apr 28 13:38:30 billie kernel: Sequencer SCB Info: 0(c 0x64, s 0x17, l
0, t 0x5) 1(c 0x64, s 0x17, l 0, t 0x7) 2(c 0x64, s 0x17, l 0, t 0x4)
3(c 0x64, s 0x17, l 0, t 0x6) 4(c 0x64, s 0x17, l 0, t
0x3) 5(c 0x64, s 0x17, l 0, t 0x2) 6(c 0x64, s 0x17, l 0, t 0xb) 7(c
0x64, s 0x17, l 0, t 0x0) 8(c 0x0, s 0xff, l 255, t 0xff) 9(c 0x0, s
0xff, l 255, t 0xff) 10(c 0x0, s 0xff, l 255, t 0xff) 11(c 0x0, s 0xff,
l 255, t 0xff) 12(c 0x0, s 0xff, l 255, t 0xff) 13(c 0x0, s 0xff, l 255,
t 0xff) 14(c 0x0, s 0xff, l 255, t 0xff) 15(c 0x0, s 0xff, l 255, t
0xff) 16(c 0x0, s 0xff, l 255, t 0xff) 17(c 0x0, s 0xff, l 255, t 0xff)
18(c 0x0, s 0xff, l 255, t 0xff) 19(c 0x0, s 0xff, l 255, t 0xff) 20(c
0x0, s 0xff, l 255, t 0xff) 21(c 0x0, s 0xff, l 255, t 0xff) 22(c 0x0, s
0xff, l 255, t 0xff)
23(c 0x0, s 0xff, l 255, t 0xff) 24(c 0x0, s 0xff, l 255, t 0xff) 25(c
0x0, s 0xff, l 255, t 0xff) 26(c 0x0, s 0xff, l 255, t 0xff) 27(c 0x0, s
0xff, l 255, t 0xff) 28(c 0x0, s 0xff, l 255, t 0xff) 29(c 0x0, s 0xff,
l 255, t 0xff) 30(c 0x0, s
Apr 28 13:38:30 billie kernel: 0xff, l 255, t 0xff) 31(c 0x0, s 0xff, l
255, t 0xff)
Apr 28 13:38:30 billie kernel: Pending list: 3(c 0x60, s 0x17, l 0), 6(c
0x60, s 0x17, l 0), 4(c 0x60, s 0x17, l 0), 5(c 0x60, s 0x17, l 0), 0(c
0x60, s 0x17, l 0), 11(c 0x60, s 0x17, l 0), 2(c 0x60, s 0x17, l 0), 7(c
0x60, s 0x17, l 0)
Apr 28 13:38:30 billie kernel: Kernel Free SCB list: 10 9 8
Apr 28 13:38:30 billie kernel: DevQ(0:1:0): 0 waiting
Apr 28 13:38:30 billie kernel: DevQ(0:5:0): 0 waiting
Apr 28 13:38:30 billie kernel: (scsi0:A:1:0): Queuing a recovery SCB
Apr 28 13:38:30 billie kernel: scsi0:0:1:0: Device is disconnected,
re-queuing SCB
Apr 28 13:38:30 billie kernel: Recovery code sleeping
Apr 28 13:38:35 billie kernel: Recovery code awake
Apr 28 13:38:35 billie kernel: Timer Expired
Apr 28 13:38:35 billie kernel: aic7xxx_abort returns 0x2003
Apr 28 13:38:35 billie kernel: scsi0:0:1:0: Attempting to queue an ABORT
message
Reply to: