[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Freeze when using SCSI drive




I have a file server with a 73 GB SCSI drive running off an Adaptec AIC-7892A controller (MSI mainboard, Duron processor, Debian 2.4.20-k7 stock kernel, aic7xxx driver module).

Occasionally, the file server looses contact with the drive. It's not the root drive, so the machine doesn't freeze, but any processes that accesses it hangs in D mode ("uninterruptable I/O") and can't be killed. The file server ceases to be useful as such, necessitating a hard reboot.

The messages from the kernel log appear below. Basically, the driver keeps trying to "queue an ABORT message". It prints a bunch of stuff, ten says "device is disconnected, re-queuing SCB" and tries again. It never suceeds.

Lately, this has happened more and more often. Perhaps this is because some piece of hardware is failing. What should I suspect first: the controller, the bus, or the drive? Also, the drive is in one of those removable (not hot-swapable) trays, so that could fail, too. On the other hand, this behavior happens pretty consistantly when I do backup, so perhaps it just triggered by heavy, sustained use.

Is there a known driver bug? A known hardware bug with this controller? Can someone intrepret the log messages below? Is there a better list to go to with this data? Any hints are appreciated.

Thanks,
David Wright

---------------------------------------------------------
Apr 28 13:38:30 billie kernel: scsi0:0:1:0: Attempting to queue an ABORT message Apr 28 13:38:30 billie kernel: scsi0: Dumping Card State while idle, at SEQADDR 0x168 Apr 28 13:38:30 billie kernel: ACCUM = 0x7, SINDEX = 0x64, DINDEX = 0x65, ARG_2 = 0x0
Apr 28 13:38:30 billie kernel: HCNT = 0x0 SCBPTR = 0x4
Apr 28 13:38:30 billie kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Apr 28 13:38:30 billie kernel:  DFCNTRL = 0x0, DFSTATUS = 0x89
Apr 28 13:38:30 billie kernel: LASTPHASE = 0x1, SCSISIGI = 0xe4, SXFRCTL0 = 0x88
Apr 28 13:38:30 billie kernel: SSTAT0 = 0x0, SSTAT1 = 0x0
Apr 28 13:38:30 billie kernel: SCSIPHASE = 0x0
Apr 28 13:38:30 billie kernel: STACK == 0x175, 0x160, 0x0, 0x34
Apr 28 13:38:30 billie kernel: SCB count = 12
Apr 28 13:38:30 billie kernel: Kernel NEXTQSCB = 1
Apr 28 13:38:30 billie kernel: Card NEXTQSCB = 1
Apr 28 13:38:30 billie kernel: QINFIFO entries:
Apr 28 13:38:30 billie kernel: Waiting Queue entries:
Apr 28 13:38:30 billie kernel: Disconnected Queue entries: 4:3 3:6 2:4 0:5 7:0 6:11 5:2 1:7
Apr 28 13:38:30 billie kernel: QOUTFIFO entries:
Apr 28 13:38:30 billie kernel: Sequencer Free SCB List: 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Apr 28 13:38:30 billie kernel: Sequencer SCB Info: 0(c 0x64, s 0x17, l 0, t 0x5) 1(c 0x64, s 0x17, l 0, t 0x7) 2(c 0x64, s 0x17, l 0, t 0x4) 3(c 0x64, s 0x17, l 0, t 0x6) 4(c 0x64, s 0x17, l 0, t 0x3) 5(c 0x64, s 0x17, l 0, t 0x2) 6(c 0x64, s 0x17, l 0, t 0xb) 7(c 0x64, s 0x17, l 0, t 0x0) 8(c 0x0, s 0xff, l 255, t 0xff) 9(c 0x0, s 0xff, l 255, t 0xff) 10(c 0x0, s 0xff, l 255, t 0xff) 11(c 0x0, s 0xff, l 255, t 0xff) 12(c 0x0, s 0xff, l 255, t 0xff) 13(c 0x0, s 0xff, l 255, t 0xff) 14(c 0x0, s 0xff, l 255, t 0xff) 15(c 0x0, s 0xff, l 255, t 0xff) 16(c 0x0, s 0xff, l 255, t 0xff) 17(c 0x0, s 0xff, l 255, t 0xff) 18(c 0x0, s 0xff, l 255, t 0xff) 19(c 0x0, s 0xff, l 255, t 0xff) 20(c 0x0, s 0xff, l 255, t 0xff) 21(c 0x0, s 0xff, l 255, t 0xff) 22(c 0x0, s 0xff, l 255, t 0xff) 23(c 0x0, s 0xff, l 255, t 0xff) 24(c 0x0, s 0xff, l 255, t 0xff) 25(c 0x0, s 0xff, l 255, t 0xff) 26(c 0x0, s 0xff, l 255, t 0xff) 27(c 0x0, s 0xff, l 255, t 0xff) 28(c 0x0, s 0xff, l 255, t 0xff) 29(c 0x0, s 0xff, l 255, t 0xff) 30(c 0x0, s Apr 28 13:38:30 billie kernel: 0xff, l 255, t 0xff) 31(c 0x0, s 0xff, l 255, t 0xff) Apr 28 13:38:30 billie kernel: Pending list: 3(c 0x60, s 0x17, l 0), 6(c 0x60, s 0x17, l 0), 4(c 0x60, s 0x17, l 0), 5(c 0x60, s 0x17, l 0), 0(c 0x60, s 0x17, l 0), 11(c 0x60, s 0x17, l 0), 2(c 0x60, s 0x17, l 0), 7(c 0x60, s 0x17, l 0)
Apr 28 13:38:30 billie kernel: Kernel Free SCB list: 10 9 8
Apr 28 13:38:30 billie kernel: DevQ(0:1:0): 0 waiting
Apr 28 13:38:30 billie kernel: DevQ(0:5:0): 0 waiting
Apr 28 13:38:30 billie kernel: (scsi0:A:1:0): Queuing a recovery SCB
Apr 28 13:38:30 billie kernel: scsi0:0:1:0: Device is disconnected, re-queuing SCB
Apr 28 13:38:30 billie kernel: Recovery code sleeping
Apr 28 13:38:35 billie kernel: Recovery code awake
Apr 28 13:38:35 billie kernel: Timer Expired
Apr 28 13:38:35 billie kernel: aic7xxx_abort returns 0x2003
Apr 28 13:38:35 billie kernel: scsi0:0:1:0: Attempting to queue an ABORT message



Reply to: