[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: can use of smartctl prevent disk probs?

Hugo Vanwoerkom wrote:
> Maurits van Rees wrote:
>> On Mon, Oct 31, 2005 at 02:12:32PM -0600, Hugo Vanwoerkom wrote:
>>> Since failures with /dev/hdc I have been paying attention to smartctl.
>>> It is a, how should one say it, interesting program that provides lots
>>> of data.
>>> But it is not clear to me how you can prevent disk failure other than
>>> buying a disk with 5 year warranty, or something.
>> That also won't prevent disk failure.  It will just get you your money
>> back or get you a new hard disk.  If it fails your data is still lost.
>> A warranty is no replacement for a backup regime.  But yes, I would
>> sooner trust and buy a hard disk with a warranty of five years than of
>> one year. :)
>> Smartctl just gives you advance warning when your hard disk is at risk
>> of failure.  When you receive such a signal then make a backup and get
>> ready to insert a new hard disk.
>> I use logcheck so I get an hourly email about events on my system.
>> There are some messages from smartctl that I don't worry about, so I
>> let logcheck ignore them.  Here are the relevant lines that I have in
>> /etc/logcheck/ignore.d.workstation/local.  These probably don't work
>> well for your system, so adapt as you see fit.
>> ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ smartd\[[0-9]+\]: Device: /dev/hdb,
>> SMART Usage Attribute: 3 Spin_Up_Time changed from
>> ([89][0-9]|1[0-9]{2}) to ([89][0-9]|1[0-9]{2})$
>> ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ smartd\[[0-9]+\]: Device: /dev/hdc,
>> SMART Usage Attribute: 194 Temperature_Celsius changed from [0-9]{3}
>> to [0-9]{3}$
>> These lines have been here for months, so for my system these messages
>> are no reason to run to the computer shop. ;) But I do make daily
>> backups of course.
> That's helpful, thanks.
> I note some more other than "Spin Up Time":
> "Seek Time Performance" + "Run Out Cancel".
> Funny things is they are pre-fail on a disk that I now use and is an
> "old" maxtor (Power Cycle Ct = 704).
> But the disk that I don't trust any more is only 6 months old (Power
> Cycle Ct = 204) and doesn't have any of those except UDMA CRC Error
> Ct=92. It failed with I/O errors when I was running on it.
> Is there a good site to find out wat those names actually mean?

I'm going through similar pain with this:

Device: /dev/hda, Failed SMART usage Attribute: 9 Power_On_Seconds.

It's not one of the "usual" attributes that comes up.  The drive hasn't
failed a self test yet, but it's very frustrating, trying to find out
what the significance of these attributes is.

Reply to: