Re: is this hard disk failure?
On Tue, 2011-06-07 at 09:02 -0400, Miles Fidelman wrote:
> Ralf Mardorf wrote:
> > For me a hard disc never gets broken without click-click-click noise
> > before it failed, but it's very common that cables and connections fail.
> >
> >
>
> By the time a disk gets to the click-click-click phase,
A phase everybody know for modern HDDs :D, but it's possible to get data
even from a disk that won't loose the heads anymore [1].
For the Atari I've got a 42MB SCSI connected to a Lacom adaptor, it
sometimes needs several boots, but it's unbreakable.
> there has been
> LOTS of warning - it's just that today's disks include lots of internal
> fault-recovery mechanisms that hide things from you, unless you run
> SMART diagnostics (and not just the basic "smart status" either).
>
> For example, if you have a machine that's suddenly running VERY slowly
Correct! Resp. if Voodoo seems to have impact to your machine, it seldom
is Voodoo, but a broken HDD.
> -
> it's good sign that a drive is experiencing internal read errors (unless
> it's a laptop - a shorted battery is a good suspect). Both are lessons
> learned the hard way, and not forgotten.
>
> Turns out that modern drives have onboard processors that retry reads
> multiple times - good for protecting data if you only have the one copy
> on that drive, at the expense of reduced disk access times. Not so good if:
>
> a. you don't notice that it's happening (the disk will eventually fail
> hard), or,
>
> b. you're running RAID - instead of the drive dropping out of the array,
> the entire array slows down as it waits for the failing drive to
> (eventually) respond
>
> In either case, you'll tear your hair out trying to figure out why your
> machine is running slowly (is it a virus, a file lock that didn't
> release, etc., etc., etc.).
>
> Lessons learned:
>
> - if your machine is running really slowly, try a reboot -- if it
> reboots properly, but takes 2 times as long (or longer) to shutdown and
> then come back up -- get very suspicious (if your patience lasts that long)
>
> - if it's a laptop - pull the battery and try again - if everything is
> normal, buy yourself a new battery
>
> - if it's a server - try booting from a liveCD (if you can, first
> disconnect the hard drive entirely) - if normal then you could well have
> a hard drive problem (or you could have a virus)
>
> - install SMART utilities and run "smartctl -A /dev/<your drive> -- the
> first line is usually the "raw read error" rate -- if the value (last
> entry on the line) is anything except 0, that's the sign that your drive
> is failing, if it's in the 1000s, failure is imminent, it's just that
> your drive's internal software is hiding it from you - replace it!
>
> - if you're running RAID, be sure to purchase "enterprise" drives (where
> "desktop" try very hard to read a sector, despite the delay; enterprise
> drives give up quickly as they expect failure recovery to be handled by
> RAID)
>
> - you would expect software raid (md) to detect slow drives, mark them
> bad, and drop them from an array -- nope, md does not keep track of delay
>
> and, not really relevant for Debian, but a direct offshoot of learning
> the above lessons:
>
> - if you're running a Mac or Windows, you're system may be reporting
> "smart status good" - but it's not really true - it's not looking at raw
> read errors
>
> - there seems to be a bug in the smart utilities for Mac (as available
> through Macports and Fink) -- the smart daemon will fail periodically,
> with the only symptom being that every few minutes, you're machine will
> slow to a crawl (spinning beachball everywhere) for 30 seconds or so,
> then recover --- a really good example of taking a pre-emptive measure
> that causes a new problem (I can't tell you how long it took to track
> this one down - what with downloading every performance tracking tool I
> could find.)
>
>
> Miles Fidelman
>
> --
> In theory, there is no difference between theory and practice.
> In<fnord> practice, there is. .... Yogi Berra
My Samsung SATA drives until now are without failure for a suspicious
long time :). I very, very often turn the computer off and on.
The only bad are the SATA connectors, a friend already planned to solder
new SATA connectors on his mobo. Note! Nobody without experiences in
soldering multi-layer boards should do this soldering. I planned to do
it too.
[1] When the heads aren't released anymore after the final click, there
still is the possibility to get them working.
- Disassemble the HDD from the case, keep the power and data cables
connected.
- With a rubber-headed mallet or something similar knock against the HDD
from several angles, while rebooting again and again.
- If it doesn't work, repeat this after the HDD did rest for a week.
Dunno while this does help, but it does, perhaps different temperatures
for the room will work like gnomes.
-- Ralf
Reply to: