[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debugging an USB array issue



Hello,

on a Debian bullseye uptodate system [1], I experiment frequent (every
3-4 hours on heavy load) disk disconnections from a md RAID10 array with
4 drives connected to an USB 10000M adapter [2].

Errors do not look like a timeout, but like a DMA error [3].

Immediately after, the disk reappears as a new drive name and can be
re-added quickly to the md RAID array (I am doing those tests with a
read-only mounted filesystem for obvious reasons).

Initially, I was wondering if it was maybe a disk doing a too long
recovery procedure, but it is to be noted that it's not always the same
disk which has an error, and smartctl -a shows no recorded errors for
any of the 4 drives [4]. The drives are connected to a SATA-to-USB
enclosure [6].

This is on a 3.1 USB PCI-Express card [5].

I already applied this work-around (which does not seem to apply to a
non-idle system):
   echo -1 > /sys/module/usbcore/parameters/autosuspend

What would be your recommandations?  I have thought about downgrading to
a slower port (it should not be much different with 5000M), changing the
cable, or maybe it's the enclosure?

Or is this a known issue (maybe with the xhci_hd driver) and I should
try another driver?

Thank you for any idea or pointer.



[1] Linux video 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
[2] 
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
        |__ Port 2: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 10000M
            |__ Port 3: Dev 8, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 1: Dev 6, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=uas, 10000M
    |__ Port 2: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
[3]
Mar 15 17:08:06 video kernel: [ 6607.383180] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Mar 15 17:08:06 video kernel: [ 6607.386754] DMAR: DRHD: handling fault status reg 3
Mar 15 17:08:06 video kernel: [ 6607.386762] DMAR: [DMA Write] Request device [01:00.0] PASID ffffffff fault addr f98be000 [fault reason 05] PTE Write access is not set
Mar 15 17:08:06 video kernel: [ 6607.386774] sd 18:0:0:0: [sde] tag#5 data cmplt err -75 uas-tag 1 inflight: CMD
Mar 15 17:08:06 video kernel: [ 6607.386780] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 5e 1d 88 00 00 00 01 00 00 00
Mar 15 17:08:06 video kernel: [ 6607.479406] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:06 video kernel: [ 6607.479708] xhci_hcd 0000:01:00.0: WARN Set TR deq ptr command for freed stream ID 38885
Mar 15 17:08:06 video kernel: [ 6607.510551] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued?
[ ... many ... ]
Mar 15 17:08:13 video kernel: [ 6614.443826] sd 18:0:0:0: [sde] tag#2 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
Mar 15 17:08:13 video kernel: [ 6614.443829] sd 18:0:0:0: [sde] tag#2 CDB: ATA command pass through(12)/Blank a1 08 2e d0 01 00 4f c2 00 b0 00 00
Mar 15 17:08:13 video kernel: [ 6614.457969] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:13 video kernel: [ 6614.458274] xhci_hcd 0000:01:00.0: WARN Set TR deq ptr command for freed stream ID 38885
[ ... many ... ]
Mar 15 17:08:25 video kernel: [ 6626.497696] sd 18:0:0:0: [sde] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=19s
Mar 15 17:08:25 video kernel: [ 6626.497725] sd 18:0:0:0: [sde] tag#5 Sense Key : Illegal Request [current]
Mar 15 17:08:25 video kernel: [ 6626.497731] sd 18:0:0:0: [sde] tag#5 Add. Sense: Invalid command operation code
Mar 15 17:08:25 video kernel: [ 6626.497739] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 5e 1d 88 00 00 00 01 00 00 00
Mar 15 17:08:25 video kernel: [ 6626.497746] blk_update_request: critical target error, dev sde, sector 5873960960 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Mar 15 17:08:25 video kernel: [ 6626.497755] md/raid10:md0: sde: rescheduling sector 11747394560
Mar 15 17:08:25 video kernel: [ 6626.497801] usb 3-1.1.4: stat urb: no pending cmd for uas-tag 3
Mar 15 17:08:25 video kernel: [ 6626.497807] md/raid10:md0: sdd: redirecting sector 11747394560 to another mirror
Mar 15 17:08:25 video kernel: [ 6626.519426] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:25 video kernel: [ 6626.519719] xhci_hcd 0000:01:00.0: WARN Set TR deq ptr command for freed stream ID 38885
Mar 15 17:08:25 video kernel: [ 6626.550583] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:25 video kernel: [ 6626.550875] xhci_hcd 0000:01:00.0: WARN Set TR deq ptr command for freed stream ID 38885
Mar 15 17:08:25 video kernel: [ 6626.562228] usb 3-1.1.4: USB disconnect, device number 9
[ ... ]
Mar 15 17:08:25 video kernel: [ 6626.717643] sd 18:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Mar 15 17:08:25 video kernel: [ 6626.717738] scsi 18:0:0:0: rejecting I/O to dead device
Mar 15 17:08:27 video kernel: [ 6628.560733] usb 3-1.1.4: new SuperSpeedPlus Gen 2 USB device number 10 using xhci_hcd
Mar 15 17:08:27 video kernel: [ 6628.590991] usb 3-1.1.4: New USB device found, idVendor=2109, idProduct=0715, bcdDevice= 1.36
Mar 15 17:08:27 video kernel: [ 6628.590994] usb 3-1.1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Mar 15 17:08:27 video kernel: [ 6628.590995] usb 3-1.1.4: Product: USB3.1 SATA Bridge
Mar 15 17:08:27 video kernel: [ 6628.590996] usb 3-1.1.4: Manufacturer: VIA Labs,Inc.
Mar 15 17:08:27 video kernel: [ 6628.590997] usb 3-1.1.4: SerialNumber: 0000000000000004
Mar 15 17:08:27 video kernel: [ 6628.652602] scsi host19: uas
Mar 15 17:08:29 video kernel: [ 6630.780028] scsi 19:0:0:0: Direct-Access     WDC WD40 EURX-63BMCY0     80.0 PQ: 0 ANSI: 6
Mar 15 17:08:29 video kernel: [ 6630.781591] sd 19:0:0:0: Attached scsi generic sg13 type 0
Mar 15 17:08:29 video kernel: [ 6630.782524] sd 19:0:0:0: [sdf] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Mar 15 17:08:29 video kernel: [ 6630.782528] sd 19:0:0:0: [sdf] 4096-byte physical blocks
Mar 15 17:08:29 video kernel: [ 6630.782663] sd 19:0:0:0: [sdf] Write Protect is off
Mar 15 17:08:29 video kernel: [ 6630.782666] sd 19:0:0:0: [sdf] Mode Sense: 2f 00 00 00
Mar 15 17:08:29 video kernel: [ 6630.782864] sd 19:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 15 17:08:29 video kernel: [ 6630.783419] sd 19:0:0:0: [sdf] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
Mar 15 17:08:29 video kernel: [ 6630.941347] sd 19:0:0:0: [sdf] Attached SCSI disk
Mar 15 17:08:30 video kernel: [ 6631.197827] md/raid10:md0: Disk failure on sde, disabling device.
Mar 15 17:08:30 video kernel: [ 6631.197827] md/raid10:md0: Operation continuing on 3 devices.

[4]     Model Number:       WDC WD40EURX-63BMCY0                    
        Model Number:       WDC WD40EURX-63BMCY0                    
        Model Number:       WDC WD40EURX-63BMCY0                    
        Model Number:       WDC WD40EURX-63BMCY0                    

[5] 01:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller

[6]
Bus 003 Device 010: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
Bus 003 Device 008: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
Bus 003 Device 007: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
Bus 003 Device 006: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
Bus 003 Device 004: ID 2109:8822 VIA Labs, Inc. USB3.1 Hub
Bus 003 Device 002: ID 2109:0822 VIA Labs, Inc. USB3.1 Hub             


Reply to: