Re: external (USB) disk errors causing hangs and 100% cpu core usage

To: debian-user@lists.debian.org
Subject: Re: external (USB) disk errors causing hangs and 100% cpu core usage
From: Gary Dale <garydale@torfree.net>
Date: Mon, 27 Jul 2015 23:40:10 -0400
Message-id: <[🔎] 55B6F99A.6020001@torfree.net>
Reply-to: gary@extremeground.com
In-reply-to: <[🔎] 20150727220831.bc318adba0a8812d9870e6bc@gmail.com>
References: <[🔎] 20150727220831.bc318adba0a8812d9870e6bc@gmail.com>

On 27/07/15 10:08 PM, Celejar wrote:

Hi,

I have a fairly new external USB disk that frequently throws errors.
When these occur, the disk becomes partially or totally inaccessible.
Even worse, when I try to umount it, the umount hangs indefinitely,
using 100% of one of my cpu cores. I haven't yet found any way to
recover short of hard booting (long press of the power button) the
system (soft booting - halt, reboot, poweroff - hang). smart seems to
show no errors, and I have yet to discover any actual data corruption -
I typically run fsck upon reboot, and it sometimes reports no problems,
and sometimes a couple of simple ones that it can fix. The whole disk
is a physical volume for a dm-crypt / cryptsetup [LUKS] encrypted
volume. The system is currently Jessie with systemd, but I believe I
began having more or less the same problems with my previous Wheezy
install. The machine is a ThinkPad T61. Any ideas?

$ lsusb
Bus 007 Device 002: ID 1058:0830 Western Digital Technologies, Inc.

# smartctl -a /dev/sdb -d sat
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.18.19-lizzie] (local
build) Copyright (C) 2002-14, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Elements / My Passport (USB, AF)
Device Model:     WDC WD10JMVW-11AJGS3
Serial Number:    WD-WXR1E848NUEZ
LU WWN Device Id: 5 0014ee 60534064a
Firmware Version: 01.01A01
User Capacity:    1,000,171,332,096 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)

...

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
   3 Spin_Up_Time            0x0027   154   109   021    Pre-fail  Always       -       3266
   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       213
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       632
  10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
  11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       134
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       42
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2008
194 Temperature_Celsius     0x0022   113   093   000    Old_age   Always       -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

[A typical batch of errors found in syslog when the disk goes offline:]

Jul 27 13:46:21 lizzie kernel: [81183.018191] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 25690113, block 0)
Jul 27 13:46:21 lizzie kernel: [81183.019891] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 46925124, block 0)
Jul 27 13:46:22 lizzie kernel: [81184.154747] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 25690113, block 0)
Jul 27 13:46:23 lizzie kernel: [81184.943103] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 2, block 0)
Jul 27 13:46:23 lizzie kernel: [81184.943147] EXT4-fs error (device dm-6): __ext4_get_inode_loc:3809: inode #2: block 1057: comm mc: unable to read itable block
Jul 27 13:46:23 lizzie kernel: [81184.943269] Buffer I/O error on dev dm-6, logical block 0, lost sync page write
Jul 27 13:46:23 lizzie kernel: [81184.943282] EXT4-fs error (device dm-6) in ext4_reserve_inode_write:4775: IO failure
Jul 27 13:46:23 lizzie kernel: [81184.943285] EXT4-fs (dm-6): previous I/O error to superblock detected
Jul 27 13:46:23 lizzie kernel: [81184.943383] Buffer I/O error on dev dm-6, logical block 0, lost sync page write
Jul 27 13:46:24 lizzie kernel: [81186.059725] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 2, block 0)
Jul 27 13:46:28 lizzie kernel: [81190.720273] Aborting journal on device dm-6-8.
Jul 27 13:46:28 lizzie kernel: [81190.720436] Buffer I/O error on dev dm-6, logical block 121667584, lost sync page write
Jul 27 13:46:28 lizzie kernel: [81190.720475] JBD2: Error -5 detected when updating journal superblock for dm-6-8.

Celejar

I've had similar problems with USB3 drives which seemed to crop up whilereading the disk. They were OK on writing. It also required a reboot tocorrect.

I still haven't resolved it - it's on a headless machine in a remotelocation so I wanted to try a BIOS upgrade next time I am out that way.My workaround was to connect the drives to USB2 ports instead of USB3.Not ideal but it's working for now.

Reply to:

Follow-Ups:
- Re: external (USB) disk errors causing hangs and 100% cpu core usage
  - From: Celejar <celejar@gmail.com>

References:
- external (USB) disk errors causing hangs and 100% cpu core usage
  - From: Celejar <celejar@gmail.com>

Prev by Date: external (USB) disk errors causing hangs and 100% cpu core usage
Next by Date: Re: Pitfalls of german-english dictionaries. Was: What pulls in the tray of my /dev/sr1 ?
Previous by thread: external (USB) disk errors causing hangs and 100% cpu core usage
Next by thread: Re: external (USB) disk errors causing hangs and 100% cpu core usage
Index(es):
- Date
- Thread