external (USB) disk errors causing hangs and 100% cpu core usage
Hi,
I have a fairly new external USB disk that frequently throws errors.
When these occur, the disk becomes partially or totally inaccessible.
Even worse, when I try to umount it, the umount hangs indefinitely,
using 100% of one of my cpu cores. I haven't yet found any way to
recover short of hard booting (long press of the power button) the
system (soft booting - halt, reboot, poweroff - hang). smart seems to
show no errors, and I have yet to discover any actual data corruption -
I typically run fsck upon reboot, and it sometimes reports no problems,
and sometimes a couple of simple ones that it can fix. The whole disk
is a physical volume for a dm-crypt / cryptsetup [LUKS] encrypted
volume. The system is currently Jessie with systemd, but I believe I
began having more or less the same problems with my previous Wheezy
install. The machine is a ThinkPad T61. Any ideas?
$ lsusb
Bus 007 Device 002: ID 1058:0830 Western Digital Technologies, Inc.
# smartctl -a /dev/sdb -d sat
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.18.19-lizzie] (local
build) Copyright (C) 2002-14, Bruce Allen, Christian Franke,
www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Elements / My Passport (USB, AF)
Device Model: WDC WD10JMVW-11AJGS3
Serial Number: WD-WXR1E848NUEZ
LU WWN Device Id: 5 0014ee 60534064a
Firmware Version: 01.01A01
User Capacity: 1,000,171,332,096 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 154 109 021 Pre-fail Always - 3266
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 213
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 632
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 134
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 42
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2008
194 Temperature_Celsius 0x0022 113 093 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
[A typical batch of errors found in syslog when the disk goes offline:]
Jul 27 13:46:21 lizzie kernel: [81183.018191] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 25690113, block 0)
Jul 27 13:46:21 lizzie kernel: [81183.019891] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 46925124, block 0)
Jul 27 13:46:22 lizzie kernel: [81184.154747] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 25690113, block 0)
Jul 27 13:46:23 lizzie kernel: [81184.943103] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 2, block 0)
Jul 27 13:46:23 lizzie kernel: [81184.943147] EXT4-fs error (device dm-6): __ext4_get_inode_loc:3809: inode #2: block 1057: comm mc: unable to read itable block
Jul 27 13:46:23 lizzie kernel: [81184.943269] Buffer I/O error on dev dm-6, logical block 0, lost sync page write
Jul 27 13:46:23 lizzie kernel: [81184.943282] EXT4-fs error (device dm-6) in ext4_reserve_inode_write:4775: IO failure
Jul 27 13:46:23 lizzie kernel: [81184.943285] EXT4-fs (dm-6): previous I/O error to superblock detected
Jul 27 13:46:23 lizzie kernel: [81184.943383] Buffer I/O error on dev dm-6, logical block 0, lost sync page write
Jul 27 13:46:24 lizzie kernel: [81186.059725] EXT4-fs warning (device dm-6): __ext4_read_dirblock:884: error -5 reading directory block (ino 2, block 0)
Jul 27 13:46:28 lizzie kernel: [81190.720273] Aborting journal on device dm-6-8.
Jul 27 13:46:28 lizzie kernel: [81190.720436] Buffer I/O error on dev dm-6, logical block 121667584, lost sync page write
Jul 27 13:46:28 lizzie kernel: [81190.720475] JBD2: Error -5 detected when updating journal superblock for dm-6-8.
Celejar
Reply to: