[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: impending disk failure?



Does this error occur by any chance after resuming from suspended state? I had a similar problem because of some faulty drivers, setting
echo 0 > /sys/power/pm_async
makes sure that drivers do not resume asynchronously and it might fix the problem.
Or can it be correlated to any other system events? It might help to attach the syslog logs with more context before and after the errors.

Kind regards,
Ondřej Grover

On Tue, Oct 20, 2015 at 12:02 PM, Tony van der Hoff <tony@vanderhoff.org> wrote:
On 17/10/15 17:47, Miles Fidelman wrote:
Dominique Dumont wrote:
On Saturday 17 October 2015 14:15:52 Tony van der Hoff wrote:
Can anyone please explain what it means, and whether I should be
worried?
You should check the drive with smartctl.

See http://www.smartmontools.org/

HTH

Yes.. and be sure to go beyond the basic tests.

First off, make sure it's running:
smartctl -s on -A /dev/disk0   ;for each drive, and using the
appropriate /dev/..

Then after, it's accumulated some stats:
smartctl -A /dev/disk0

For a lot of drives, the first line - raw read errors, can be very
telling - anything other than 0, and your disk is failing.
Start-up-time can be telling, if it's increasing.

The thing is, that most drives, except those designed for use in RAID
arrays, mask impending disk failures, by re-reading blocks multiple
times - they often get the data eventually, but your machine keeps
getting slower and slower.



Thanks Miles, and tomás, for your helpful replies.

I apologise for the delay in replying, but I've been away from my desk a few days.

I have however been doing some extensive googling, and it would appear that the raw read error count is something of a red herring, especially when applied to Seagate drives, as these are. Both my drives have quite high (in the millions) of RREC; numbers which are precisely matched by the Hardware ECC Recovered counts, suggesting that the RREC is merely an artifact od HHDs being essentially a mechanical device, being pushed to its limits using clever technology. The SMART extended tests reveal no problems.

The Wikipedia entry https://en.wikipedia.org/wiki/S.M.A.R.T. is particularly informative in the relative importance of these error counts; the RREC can be safely ignored, as somebody else here recently suggested.

So, back to the original problem; I think tomás hit the nail on the head. I've re-plugged the SATA cables, to no great effect; I have now ordered a couple of new cables, and will see whether that helps.

Thanks again to  all.




--
Tony van der Hoff        | mailto:tony@vanderhoff.org
Buckinghamshire, England |



Reply to: