[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

disk error -> reset entire USB connection



I have multiple drives in a Vantec HX4 case connect by USB 3.0  It
seems a disk I/O error* causes the entire USB connection to reset,
causing all the drives to be remapped and screwing up the connections
to all the disks in the case.

Is this expected behavior for the linux kernel?

I'm running linux-image 3.16.39-1~bpo70+1 and I first noticed the
problem within a day of an upgrade to the kernel that had a lot of
changes, though retaining the same number:
[UPGRADE] linux-image-3.16.0-0.bpo.4-amd64:amd64
3.16.36-1+deb8u2~bpo70+1 -> 3.16.39-1~bpo70+1
The machine is still on Debian 7.11 aka wheezy.

There are a number of USB changes in the changelog; this one looks in
the same area as my problem:
    - usb: xhci: Fix panic if disconnect

I do seem to have a hardware problem with one of the disks (though I'm
now getting an error for the replacement).  I moved it to a single
drive case connected by USB and got errors when reading the same
sectors that originally caused trouble.  It also exhibited the same
behavior of reseting the USB connection and remapping the drive,
though it only affected the drive with the problem in this case.

*It is possible the error is happening further up the chain, and that
this causes the reset which in turn causes the reported I/O error.
Since the error seemed to move with the disk, it seems more likely the
error is on the disk.

Here's what the log looks like:
Feb  2 15:26:50 tempserver kernel: [ 8587.312261] usb 2-3: USB
disconnect, device number 4
Feb  2 15:26:50 tempserver kernel: [ 8587.315908] scsi 8:0:0:3:
rejecting I/O to offline device
Feb  2 15:26:50 tempserver kernel: [ 8587.315912] scsi 8:0:0:3: [sdk]
killing request
Feb  2 15:26:50 tempserver kernel: [ 8587.315934] scsi 8:0:0:3: [sdk]
Unhandled error code
Feb  2 15:26:50 tempserver kernel: [ 8587.315936] scsi 8:0:0:3: [sdk]
Feb  2 15:26:50 tempserver kernel: [ 8587.315939] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb  2 15:26:50 tempserver kernel: [ 8587.315941] scsi 8:0:0:3: [sdk] CDB:
Feb  2 15:26:50 tempserver kernel: [ 8587.315943] Write(16): 8a 00 00
00 00 00 00 0c 3e 80 00 00 00 80 00 00
Feb  2 15:26:50 tempserver kernel: [ 8587.315951] end_request: I/O
error, dev sdk, sector 802432
Feb  2 15:26:50 tempserver kernel: [ 8587.315963] md/raid1:md126: Disk
failure on sdk2, disabling device.
Feb  2 15:26:50 tempserver kernel: [ 8587.315963] md/raid1:md126:
Operation continuing on 1 devices.
Feb  2 15:26:50 tempserver kernel: [ 8587.315964] scsi 8:0:0:3:
rejecting I/O to offline device
Feb  2 15:26:50 tempserver kernel: [ 8587.315966] scsi 8:0:0:3: [sdk]
killing request
Feb  2 15:26:50 tempserver kernel: [ 8587.315968] scsi 8:0:0:3:
rejecting I/O to dead device
# many repeats of previous error message
Feb  2 15:26:50 tempserver kernel: [ 8587.316067] scsi 8:0:0:3: [sdk]
Unhandled error code
Feb  2 15:26:50 tempserver kernel: [ 8587.316068] scsi 8:0:0:3: [sdk]
Feb  2 15:26:50 tempserver kernel: [ 8587.316069] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Feb  2 15:26:50 tempserver kernel: [ 8587.316070] scsi 8:0:0:3: [sdk] CDB:
Feb  2 15:26:50 tempserver kernel: [ 8587.316071] Write(16): 8a 00 00
00 00 00 00 0c 3f 00 00 00 00 80 00 00
Feb  2 15:26:50 tempserver kernel: [ 8587.316076] end_request: I/O
error, dev sdk, sector 802560
Feb  2 15:26:50 tempserver kernel: [ 8587.383515] RAID1 conf printout:
Feb  2 15:26:50 tempserver kernel: [ 8587.383518]  --- wd:1 rd:2
Feb  2 15:26:50 tempserver kernel: [ 8587.383520]  disk 0, wo:0, o:1, dev:sdb2
Feb  2 15:26:50 tempserver kernel: [ 8587.383521]  disk 1, wo:1, o:0, dev:sdk2
Feb  2 15:26:50 tempserver kernel: [ 8587.394138] RAID1 conf printout:
Feb  2 15:26:50 tempserver kernel: [ 8587.394140]  --- wd:1 rd:2
Feb  2 15:26:50 tempserver kernel: [ 8587.394142]  disk 0, wo:0, o:1, dev:sdb2
# and then the entire box is reconnected, and all the drives are added
back under new names
Feb  2 15:26:54 tempserver kernel: [ 8591.433907] usb 1-3: new
high-speed USB device number 4 using xhci_hcd
Feb  2 15:26:54 tempserver kernel: [ 8591.730098] usb 2-3: new
SuperSpeed USB device number 5 using xhci_hcd
Feb  2 15:26:54 tempserver kernel: [ 8591.747106] usb 2-3: New USB
device found, idVendor=152d, idProduct=0551
Feb  2 15:26:54 tempserver kernel: [ 8591.747109] usb 2-3: New USB
device strings: Mfr=1, Product=2, SerialNumber=5
Feb  2 15:26:54 tempserver kernel: [ 8591.747110] usb 2-3: Product:
USB to ATA/ATAPI Bridge
Feb  2 15:26:54 tempserver kernel: [ 8591.747111] usb 2-3: Manufacturer: JMicron
Feb  2 15:26:54 tempserver kernel: [ 8591.747112] usb 2-3:
SerialNumber: DA00862620FF
Feb  2 15:26:54 tempserver kernel: [ 8591.749305] usb-storage 2-3:1.0:
USB Mass Storage device detected
Feb  2 15:26:54 tempserver kernel: [ 8591.749555] scsi9 : usb-storage 2-3:1.0
Feb  2 15:26:55 tempserver kernel: [ 8592.747992] scsi 9:0:0:0:
Direct-Access     WDC WD20 01FASS-00W2B0         PQ: 0 ANSI: 5
Feb  2 15:26:55 tempserver kernel: [ 8592.748271] scsi 9:0:0:1:
Direct-Access     WDC WD40 00FYYZ-01UL1B2        PQ: 0 ANSI: 5
Feb  2 15:26:55 tempserver kernel: [ 8592.748540] scsi 9:0:0:2:
Direct-Access     WDC WD20 EARS-00MVWB0          PQ: 0 ANSI: 5
Feb  2 15:26:55 tempserver kernel: [ 8592.748777] scsi 9:0:0:3:
Direct-Access     WDC WD40 01FFSX-68JNUN0        PQ: 0 ANSI: 5


Reply to: