Re: impending disk failure?

To: debian-user@lists.debian.org
Subject: Re: impending disk failure?
From: Miles Fidelman <mfidelman@meetinghouse.net>
Date: Tue, 20 Oct 2015 09:40:11 -0400
Message-id: <[🔎] 5626443B.5090504@meetinghouse.net>
In-reply-to: <[🔎] 56261127.7090605@vanderhoff.org>
References: <[🔎] 56224A08.7030005@vanderhoff.org> <[🔎] 4827731.P3X1el4hlF@ylum> <[🔎] 56227BA8.6030809@meetinghouse.net> <[🔎] 56261127.7090605@vanderhoff.org>

See below

Tony van der Hoff wrote:

On 17/10/15 17:47, Miles Fidelman wrote:
Dominique Dumont wrote:
On Saturday 17 October 2015 14:15:52 Tony van der Hoff wrote:
Can anyone please explain what it means, and whether I should be
worried?
You should check the drive with smartctl.

See http://www.smartmontools.org/

HTH
Yes.. and be sure to go beyond the basic tests.

First off, make sure it's running:
smartctl -s on -A /dev/disk0   ;for each drive, and using the
appropriate /dev/..

Then after, it's accumulated some stats:
smartctl -A /dev/disk0

For a lot of drives, the first line - raw read errors, can be very
telling - anything other than 0, and your disk is failing.
Start-up-time can be telling, if it's increasing.

The thing is, that most drives, except those designed for use in RAID
arrays, mask impending disk failures, by re-reading blocks multiple
times - they often get the data eventually, but your machine keeps
getting slower and slower.
Thanks Miles, and tomás, for your helpful replies.
I apologise for the delay in replying, but I've been away from my deska few days.
I have however been doing some extensive googling, and it would appearthat the raw read error count is something of a red herring,especially when applied to Seagate drives, as these are. Both mydrives have quite high (in the millions) of RREC; numbers which areprecisely matched by the Hardware ECC Recovered counts, suggestingthat the RREC is merely an artifact od HHDs being essentially amechanical device, being pushed to its limits using clever technology.The SMART extended tests reveal no problems.
The Wikipedia entry https://en.wikipedia.org/wiki/S.M.A.R.T. isparticularly informative in the relative importance of these errorcounts; the RREC can be safely ignored, as somebody else here recentlysuggested.


You're missing the point.

As the Wikipedia also points out:

<https://en.wikipedia.org/wiki/S.M.A.R.T.#cite_note-seagate1-2>"Mechanicalfailures account for about 60% of all drive failures." and "Further, 36%of drives failed without recording any S.M.A.R.T. error at all, exceptthe temperature, meaning that S.M.A.R.T. data alone was of limitedusefulness in anticipating failures."

Today's disk drives are designed to PROTECT DATA, AND MAINTAIN ACCESS TODATA, until the very moment before the drive fails catastrophically.The "Hardware ECC Recovered Count" indicates that:- there are likely to be problems with the underlying media that the ECCis recovering from, that will only get worse over time- the recovery takes time, hence the reason you system is slowing down -the more underlying errors, the more time it takes to recover

I've never found SMART extended tests to be indicative of anything,until a disk is nearly dead. Thoughhttp://www.z-a-recovery.com/manual/smart.aspx gives a good list of otherSMART variables that might indicate mechanical failures.

If your drives are a couple of years old, and your machine is gettingslower, don't engage in wishful thinking - backup and get new drives.


Miles

--
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra

Reply to:

References:
- impending disk failure?
  - From: Tony van der Hoff <tony@vanderhoff.org>
- Re: impending disk failure?
  - From: Dominique Dumont <dod@debian.org>
- Re: impending disk failure?
  - From: Miles Fidelman <mfidelman@meetinghouse.net>
- Re: impending disk failure?
  - From: Tony van der Hoff <tony@vanderhoff.org>

Prev by Date: Lenovo ThinkServer Support
Next by Date: Re: 64bit run on mips64r2
Previous by thread: Re: impending disk failure?
Next by thread: Re: impending disk failure?
Index(es):
- Date
- Thread