Re: Disk heads won't park [pat II]
Reviving this thread since i tried turning the machine on again (and
amybe another thread will bump this one).
And, again (well i wasn't expecting it to go away), as soon as the
machine starts - right after POST, even before GRUB - the drive starts
making "reading noise" (like when an antivirus is scanning or the
system is thrashing). The only way it stops is with hdparm -y (no
wonders there).
This is a Seagate Barracuda ST31000528AS drive with a CC49 firmware
upgrade. Here's a few other commands i tried:
~# hdparm -Z /dev/sdd
/dev/sdd:
disabling Seagate auto powersaving mode
HDIO_DRIVE_CMD(seagatepwrsave) failed: Input/output error
~# hdparm -B /dev/sdd
APM_level = not supported
The message "Incorrect metadata area header checksum on /dev/sdd1 at
offset 4096" shows up in dmesg and on lvm operations. Here's some
SMART fun:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 113 099 006 - 50643073
3 Spin_Up_Time PO---- 095 095 000 - 0
4 Start_Stop_Count -O--CK 099 099 020 - 1274
5 Reallocated_Sector_Ct PO--CK 047 047 036 - 2181
7 Seek_Error_Rate POSR-- 075 060 030 - 35143152
9 Power_On_Hours -O--CK 079 079 000 - 18750
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 638
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 099 000 - 1
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 071 051 045 - 29 (Min/Max 22/29)
194 Temperature_Celsius -O---K 029 049 000 - 29 (0 11 0 0)
195 Hardware_ECC_Recovered -O-RC- 026 018 000 - 50643073
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 178838143258596
241 Total_LBAs_Written ------ 100 253 000 - 1457922426
242 Total_LBAs_Read ------ 100 253 000 - 1552877542
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
I've also run a few tests, but they also show as Aborted even when i
let it run for hours:
=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Self-test routine in progress 90% 18752 -
# 2 Short offline Aborted by host 90% 18752 -
# 3 Short offline Aborted by host 90% 18751 -
# 4 Short offline Aborted by host 90% 18751 -
# 5 Short offline Aborted by host 90% 18751 -
# 6 Extended offline Aborted by host 90% 18750 -
# 7 Extended offline Completed without error 00% 18746 -
# 8 Extended offline Aborted by host 90% 18742 -
# 9 Extended offline Interrupted (host reset) 90% 18742 -
#10 Short offline Interrupted (host reset) 00% 18741 -
#11 Short offline Completed without error 00% 18653 -
# smartctl -A /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail
Always - 50652210
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age
Always - 1283
5 Reallocated_Sector_Ct 0x0033 047 047 036 Pre-fail
Always - 2181
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35273246
9 Power_On_Hours 0x0032 079 079 000 Old_age
Always - 18754
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 638
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age
Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age
Always - 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 054 051 045 Old_age
Always - 46 (Min/Max 22/47)
194 Temperature_Celsius 0x0022 046 049 000 Old_age
Always - 46 (0 11 0 0)
195 Hardware_ECC_Recovered 0x001a 026 018 000 Old_age
Always - 50652210
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 219034742181864
241 Total_LBAs_Written 0x0000 100 253 000 Old_age
Offline - 1457922426
242 Total_LBAs_Read 0x0000 100 253 000 Old_age
Offline - 1552877978
Then i ran ~# smartctl -l error /dev/sdd every minute or so, these 2
change a lot:
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35274663
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35274722
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35274773
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35274811
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35274850
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35274971
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35275013
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 239070764617704
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 251835407421416
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 262632955203560
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 271287314305000
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 279546536415208
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 23424751652840
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 32220844675048
I do assume it is failing, but i'd like to know why and which values
are really tell-tale (for instance the WHEN_FAILED column above is
empty, so i can't realyl draw any conclusions).
This is a recently installed, headless system with almost nothing installed.
Thanks,
Nuno
Reply to: