Re: Dying hard drive?
Hello Miles,
On Fri, Oct 18, 2013 at 11:43:59AM -0400, Miles Fidelman wrote:
> Do a smartctl -A /dev/sd[abcd] - look for non-zero raw read errors
> and reallocated sector counts. I've found, at least for the WD
> drives I use in my servers - anything other than a 0 raw-read-error
> count is a sign of near-term disk failure. The first time I
> encountered the symptoms you report, it took me a LONG time to
> figure it out. The basic SMART test is useless.
IIUIC, this is output I should be looking:
sda:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 230210521
5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 5832
sdb:
1 Raw_Read_Error_Rate 0x000f 119 082 006 Pre-fail Always - 234455192
5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always - 2320
sdc:
1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 87852008
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
sdd:
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 187317944
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
Acording to this, all drives are bad, but only sda behaves badly. Three more
values are reported as Pre-fail: Spin_Up_Time, Seek_Error_Rate and
Spin_Retry_Count. Full output for sda:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 230231673
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 3
5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 5832
7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 471820319
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 9750
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 4
183 Runtime_Bad_Block 0x0032 098 098 000 Old_age Always - 2
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 386
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0022 063 056 045 Old_age Always - 37 (Min/Max 22/44)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1266
194 Temperature_Celsius 0x0022 037 044 000 Old_age Always - 37 (0 22 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 174414326932768
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 23166370191361
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 174661697951516
> Note that this particularly applies if you're not using an
> enterprise-class drive. Standard drives try very hard to read from
> the medium, and take a long time before they give up. Enterprise
> drives assume they're part of a RAID array and just give up,
> throwing an error.
I'm using Seagate ST3000DM001-9YN166 drives, not enterprise-class drives.
Regards,
Veljko
Reply to: