Re: writing to an external USB3 HD

=> I had problems with a large USB 3.0 (3TB and over) external hard-drive.

=> I have ext3 file system on a drive and I had a single partition for the entire drive.
Initially, when the data on the drive was less it would mount (and read/write) fine.

When the drive disk space was used up over 70%, I saw that the external hard drive would not mount when connected via the USB 3.0 port.
dmesg would show something like this:

[ 222.828893] xhci_hcd 0000:00:14.0: OUT Endpoint 02 Context (ep_index 03):
[ 222.828898] xhci_hcd 0000:00:14.0: @ffff8800d50bb080 (virt) @d50bb080 (dma) 0x000001 - ep_info
[ 222.828903] xhci_hcd 0000:00:14.0: @ffff8800d50bb084 (virt) @d50bb084 (dma) 0x2000016 - ep_info2
[ 222.828914] xhci_hcd 0000:00:14.0: @ffff8800d50bb088 (virt) @d50bb088 (dma) 0xd50d4401 - deq
[ 222.828918] xhci_hcd 0000:00:14.0: @ffff8800d50bb090 (virt) @d50bb090 (dma) 0x000000 - tx_info
[ 222.828926] xhci_hcd 0000:00:14.0: @ffff8800d50bb094 (virt) @d50bb094 (dma) 0x000000 - rsvd[0]
[ 222.828933] xhci_hcd 0000:00:14.0: @ffff8800d50bb098 (virt) @d50bb098 (dma) 0x000000 - rsvd[1]
[ 222.828941] xhci_hcd 0000:00:14.0: @ffff8800d50bb09c (virt) @d50bb09c (dma) 0x000000 - rsvd[2]
[ 222.829004] xhci_hcd 0000:00:14.0: Endpoint 0x81 not halted, refusing to reset.
[ 222.829009] xhci_hcd 0000:00:14.0: Endpoint 0x2 not halted, refusing to reset.
[ 222.829016] usb_reset_device returns 0
[ 222.829023] scsi command aborted
[ 222.829028] *** thread sleeping
[ 222.829079] scsi 8:0:0:0: Device offlined - not ready after error recovery
[ 222.829144] usb-storage 3-2:1.0: scan complete
[ 222.923625] xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.

However it mounted fine when connected via USB 2.0 ports

=> To understand the issue more I complied my kernel from source and enabled CONFIG_DYNAMIC_DEBUG, CONFIG_USB_DEBUG and few other flags that I don't remember now.

However, I couldn't get to spending more time on understand the xhci subsystem, so I don't yet know what might be happening.

=> But, I guess, one thing you could do is check if the issues are happening when connected via a USB 2.0 port too.

If things seem to be working via USB 2.0 then it might be a issue with the xhci subsystem.

On Sun, Jul 12, 2015 at 11:47 PM, Gary Dale <garydale@torfree.net> wrote:

On 12/07/15 08:45 PM, Gary Dale wrote:

On 11/07/15 04:50 PM, Gary Dale wrote:

On 11/07/15 11:40 AM, Gary Dale wrote:

On 11/07/15 06:41 AM, Petter Adsen wrote:

On Sat, 11 Jul 2015 04:53:42 -0400
Gary Dale <garydale@torfree.net> wrote:

Further to the issue, the problem is triggered by reading from the disk.
I can copy files onto the disk without problems but when I issue the cmp
command to compare the copy to the original, I get system error messages
and the drive vanishes.

This doesn't happen immediately. It seems to after some significant file
i/o. Interestingly, I've gotten i/o error messages on files that, after
a reboot (remote system so I can't unplug and replug the drive), compare
OK. This leads me to think it's not a disk problem. Also smartctl -H
says the disk is healthy.

This might be related to the problems with USB disks Bob Proulx
described back in April, you can find the posting here:

https://lists.debian.org/debian-user/2015/04/msg00105.html

You could also run a selftest on the disk with smartctl to see what
that says. Check the "-t" option in the man page for details.

Petter

- I ran the long test and -H still reports no problems with the disk. The output is:

root@molar:/home/garydale# smartctl -t long /dev/sde
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 209 minutes for test to complete.
Test will complete after Sat Jul 11 08:19:48 2015

Use smartctl -X to abort test.
root@molar:/home/garydale# smartctl -H /dev/sde
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

- The disks are usually swapped every week so no disk is mounted for months on end. In this case, the disks are new to accommodate an increased amount of data to be backed up, so I am positive that no disk has been plugged in for more than a week. The offsite backup script has been running flawlessly for a long time with smaller disks plugged into a USB2 port (on a different motherboard).

- The largest files are 25G, the maximum size set for my bacula volumes.

- I have done a reformat on the drive, both a quick one and a full one. The problem persists.

- I'm doubtful that ionice would help. This would be about the only active process on the system which is otherwise idle at the time the script is run. Even with ionice set to idle, it shouldn't have any impact. I can try a combo of nice and ionice to see if it's a speed issue once the office closes.

trying with both nice and ionice had no effect. :(

Definitely not the USB drives. I'm getting the same problem on four different USB drives I plugged in. Writes work but reads (cmp or rsync) seem to cause problems.

I was able to cp files off a USB drive on a system running stretch with a more recent AMD chipset. Unfortunately this isn't sufficient to tell me if it is a problem with the other motherboard, the chipset or the kernel version. So far as I can see, there is no newer kernel available yet in jessie-backports to reduce the possible problem area further.

Has anyone had any problems with USB3 devices and the AMD 970 chipset?

--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: [🔎] 55A334D8.8020601@torfree.net" rel="noreferrer" target="_blank">https://lists.debian.org/[🔎] 55A334D8.8020601@torfree.net