[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Interpreting the output of ide-smart



On Sun, 2005-03-06 at 10:38 -0500, Scott V. McGuire wrote:
> After a string of hard drive failures I've been trying to monitor my
> drives more carefully.  I had ide-smart run the offline tests and got
> results.  Can anyone shed some light on how they should be
> interpreted?  For example, in
> 
> Id=202  Status=10  {Advisory    Online }  Value=253  Threshold=  0  Passed
> Id=203  Status=11  {Prefailure  Online }  Value=253  Threshold=180  Passed
> 
> I think I should read the column which says 'Advisory' or 'Prefailure'
> as a description of the test, not the result.  In which case, they
> passed so I shouldn't worry.  But I could also interpret the second
> line as saying "The drive passes now but is about to fail".  Is either
> correct?

Don't know about ide-smart, but I've had drive issues too and use
smartmontools now. I configured the demon to issue short self tests
daily and a long one once a week with an entry like this in
/etc/smartd.conf:
/dev/hda -a -o on -S on -s (S/../.././04|L/../../6/04) -I 194 -m root

Apart from that, i have a cronjob to daily mail me the output of
smartctl -a /dev/hda

That output contains all the drive status params smart can gather, like

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   065   065   000    Pre-fail  Always       -       6016
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       18
....

Now, here you'll want to watch the Pre-fail attribute counts, since,
as their type suggests, they tend to (rapidly) increase before a
failure. In all my disk issues this has been particularly true for
Raw_Read_Error_Rate, so I watch that one closely. Small values like
3 or 10 are OK, but rapid increase over a couple of days to hundreds
or thousands means the drive is about to die, typically during the
next one or two days. Single blocks may already be unreadable at that
point, so backup the drive immediately.

Note that even in that case, smart might still say the drive PASSED the
self test, so a PASS should not really comfort you. It definitely makes
sense to check the status attributes yourself.

Regards, Bruno.





Reply to: