[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: is this hard disk failure?



On Tue, 2011-06-07 at 09:02 -0400, Miles Fidelman wrote:
> Ralf Mardorf wrote:
> > For me a hard disc never gets broken without click-click-click noise
> > before it failed, but it's very common that cables and connections fail.
> >
> >    
> 
> By the time a disk gets to the click-click-click phase,

A phase everybody know for modern HDDs :D, but it's possible to get data
even from a disk that won't loose the heads anymore [1].
For the Atari I've got a 42MB SCSI connected to a Lacom adaptor, it
sometimes needs several boots, but it's unbreakable.

>  there has been 
> LOTS of warning - it's just that today's disks include lots of internal 
> fault-recovery mechanisms that hide things from you, unless you run 
> SMART diagnostics (and not just the basic "smart status" either).
> 
> For example, if you have a machine that's suddenly running VERY slowly

Correct! Resp. if Voodoo seems to have impact to your machine, it seldom
is Voodoo, but a broken HDD.

>  - 
> it's good sign that a drive is experiencing internal read errors (unless 
> it's a laptop - a shorted battery is a good suspect).  Both are lessons 
> learned the hard way, and not forgotten.
> 
> Turns out that modern drives have onboard processors that retry reads 
> multiple times - good for protecting data if you only have the one copy 
> on that drive, at the expense of reduced disk access times.  Not so good if:
> 
> a. you don't notice that it's happening (the disk will eventually fail 
> hard), or,
> 
> b. you're running RAID - instead of the drive dropping out of the array, 
> the entire array slows down as it waits for the failing drive to 
> (eventually) respond
> 
> In either case, you'll tear your hair out trying to figure out why your 
> machine is running slowly  (is it a virus, a file lock that didn't 
> release, etc., etc., etc.).
> 
> Lessons learned:
> 
> - if your machine is running really slowly, try a reboot -- if it 
> reboots properly, but takes 2 times as long (or longer) to shutdown and 
> then come back up -- get very suspicious (if your patience lasts that long)
> 
> - if it's a laptop - pull the battery and try again - if everything is 
> normal, buy yourself a new battery
> 
> - if it's a server - try booting from a liveCD (if you can, first 
> disconnect the hard drive entirely) - if normal then you could well have 
> a hard drive problem (or you could have a virus)
> 
> - install SMART utilities and run "smartctl -A /dev/<your drive> -- the 
> first line is usually the "raw read error" rate -- if the value (last 
> entry on the line) is anything except 0, that's the sign that your drive 
> is failing, if it's in the 1000s, failure is imminent, it's just that 
> your drive's internal software is hiding it from you - replace it!
> 
> - if you're running RAID, be sure to purchase "enterprise" drives (where 
> "desktop" try very hard to read a sector, despite the delay; enterprise 
> drives give up quickly as they expect failure recovery to be handled by 
> RAID)
> 
> - you would expect software raid (md) to detect slow drives, mark them 
> bad, and drop them from an array -- nope, md does not keep track of delay
> 
> and, not really relevant for Debian, but a direct offshoot of learning 
> the above lessons:
> 
> - if you're running a Mac or Windows, you're system may be reporting 
> "smart status good" - but it's not really true - it's not looking at raw 
> read errors
> 
> - there seems to be a bug in the smart utilities for Mac (as available 
> through Macports and Fink) -- the smart daemon will fail periodically, 
> with the only symptom being that every few minutes, you're machine will 
> slow to a crawl (spinning beachball everywhere) for 30 seconds or so, 
> then recover --- a really good example of taking a pre-emptive measure 
> that causes a new problem (I can't tell you how long it took to track 
> this one down - what with downloading every performance tracking tool I 
> could find.)
> 
> 
> Miles Fidelman
> 
> -- 
> In theory, there is no difference between theory and practice.
> In<fnord>  practice, there is.   .... Yogi Berra

My Samsung SATA drives until now are without failure for a suspicious
long time :). I very, very often turn the computer off and on.
The only bad are the SATA connectors, a friend already planned to solder
new SATA connectors on his mobo. Note! Nobody without experiences in
soldering multi-layer boards should do this soldering. I planned to do
it too.

[1] When the heads aren't released anymore after the final click, there
still is the possibility to get them working.

- Disassemble the HDD from the case, keep the power and data cables
connected.
- With a rubber-headed mallet or something similar knock against the HDD
from several angles, while rebooting again and again.
- If it doesn't work, repeat this after the HDD did rest for a week.
Dunno while this does help, but it does, perhaps different temperatures
for the room will work like gnomes.

-- Ralf


Reply to: