[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SMART data, should I change my HDD?



On Mon, 28 Apr 2014 18:43:17 -0400
KS <lists04@fastmail.fm> wrote:

> Hi,
> 
> I was checking one of my systems and the SMART data for /dev/sda came
> out as below. Should I change it to avoid loosing data? If not, which
> information in SMART data indicates that it is time to do it?
> 
> Thanks,
> KS
> -------------------
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.34-std312-amd64]
> (local build)
> Copyright (C) 2002-11 by Bruce Allen,
> http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Caviar Black
> Device Model:     WDC WD5001AALS-00J7B1
> Serial Number:    WD-WMATV6698581
> LU WWN Device Id: 5 0014ee 0577e9964
> Firmware Version: 05.00K05
> User Capacity:    500,107,862,016 bytes [500 GB]
> Sector Size:      512 bytes logical/physical
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Mon Apr 28 18:38:59 2014 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x84) Offline data collection
> activity was suspended by an interrupting
> command from host.
>                                         Auto Offline Data Collection:
> Enabled.
> Self-test execution status:      (   0) The previous self-test routine
> completed
>                                         without error or no self-test
> has ever
>                                         been run.
> Total time to complete Offline
> data collection:                (11160) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline
> immediate. Auto Offline data collection
> on/off support.
>                                         Suspend Offline collection
> upon new command.
>                                         Offline surface scan
> supported. Self-test supported.
>                                         Conveyance Self-test
> supported. Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before
> entering power-saving mode.
>                                         Supports SMART auto save
> timer. Error logging capability:        (0x01) Error logging
> supported. General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        ( 131) minutes.
> Conveyance self-test routine
> recommended polling time:        (   5) minutes.
> SCT capabilities:              (0x3037) SCT Status supported.
>                                         SCT Feature Control supported.
>                                         SCT Data Table supported.
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always
>       -       0
>   3 Spin_Up_Time            0x0027   229   221   021    Pre-fail
> Always
>       -       8525
>   4 Start_Stop_Count        0x0032   099   099   000    Old_age
> Always
>       -       1124
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always
>       -       0
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always
>       -       0
>   9 Power_On_Hours          0x0032   091   091   000    Old_age
> Always
>       -       7208
>  10 Spin_Retry_Count        0x0032   100   100   000    Old_age
> Always
>       -       0
>  11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
> Always
>       -       0
>  12 Power_Cycle_Count       0x0032   099   099   000    Old_age
> Always
>       -       1123
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always
>       -       31
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
> Always
>       -       1124
> 194 Temperature_Celsius     0x0022   108   101   000    Old_age
> Always
>       -       42
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always
>       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always
>       -       0
> 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always
>       -       0
> 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
> Offline      -       0
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [To run self-tests, use: smartctl -t]
> 
> 
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute
> delay.

Looks good to me, always assuming you're not seeing read and write
errors while running your applications. I'm assuming here that you
also ran the offline tests: Otherwise this isn't enough information.

When the offline-uncorrectable gets above 0, I replace the drive: New
bad sectors can be expected. When current pending sector gets above 0,
I worry hard, because that's a bad sign.

You mention in another email concern about a drive temperature of 42C.
We'd all like our components to be 32C all the time, but that's more
of a hope than a reality. Depending on your processor, video card, and
ventilation, it's possible that the ambient temperature in your box
is 42 degrees, and the box is actually heating up your drive. Right now
I'm testing my new build of my rsync backup server with a new 3TB WD
Green drive, running this command on my whole backup history:

find /backupserver/stevebup | -exec ls -lh {} +

The WD green is running at 37C, and the WD blue system disk is running
at 40C. The CPU is 45C and the mobo temp is 36C. This is in a box with
a top mounted 200mm fan pushing out in excess of 100 cubic feet per
minute, along with a back mounted outbound 120mm, two inbound front
mounted 120's, and a side mounted inbound 140. Can you imagine what
would be happening in there if I had less fans? Modern processors are
100 watts: That's like putting a 100 watt incandescent lightbulb in
there. Without adequate ventilation, it could turn into an oven and
bake your hard drives.

Hang on a second: I just stopped my test process, let's see what
happens to the temperatures after a couple minutes...

Well, it's been about 15 minutes since I killed my test program so that
my backup server was idling. My CPU is now 42C, the WD black system disk
is 40C, and the WD green data disk is now 36C.

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance


Reply to: