[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: detecting bad RAID disk



On Wed, Sep 12, 2007 at 07:58:45AM -0700, michael@estone.ca wrote:
> Quoting Mark Copper <mcopper@titaninterface.com>:
> 
> >On Mon, Sep 10, 2007 at 09:46:47AM -0700, tabris wrote:
> >>Mark Copper wrote:
> >>> Dear Users,
> >>>
> >>> I have an Intel machine on which I installed software RAID 1 using a
> >>> Knoppix trick back in January of last year:
> >>>
> >>> # uname -a
> >>> Linux deneb 2.6.15 #1 SMP PREEMPT Thu Jan 5 18:12:48 EST 2006 i686
> >>> GNU/Linux
> >>>
> >>> The machine suffered occasional kernel panics which, upon removal from
> >>> the data center where I colocated it, I have not been able to reproduce.
> >>> However, I do notice occasional "hesitations" involving disk writes that
> >>> I felt were somehow related to the panics.  There was also a post to
> >>> kernel.org at the time where in a similar setup kernel panics were
> >>> traced to a bad hard disc.
> >>>
> >>> So, I'm thinking simply to replace both hard drives.
> >>>
> >>> Is this foolish?  Is there a better approach not requiring special
> >>> equipment to diagnosing the problem?
> >>>
> >>> thanks.
> >>>
> >>> Mark
> >>>
> >>try smart-tools. a) it can tell the disc to test itself b) it can tell
> >>you what the hard-drive thinks about itself (don't pay too much attn to
> >>"PASSED" b/c that's just a 24 hour warning)
> >>
> >>    And yes, it does work with SATA drives, it just needs the '-d ata' 
> >>    hint
> >
> >Thank you for this.  My discs get a clean bill of health from SMART.
> >
> >So I'm left with these hesitations I don't understand.  These happen
> >with simple bash commands (ls, man, mv) as well as delivery of web
> >pages.  For instance, I just waited nearly 30 seconds for "man" to
> >return, but only when the given command has not been used for a while.
> >
> >Is there some aspect to disk access that SMART does not test?
> >
> 
> Have you tried upgrading your kernel to the latest stable release?
> 2.6.15 is old these days, and if I remember correctly, may have had  
> some memory bugs.
> 2.6.18 is Debian's stable release.

This seems to do the trick (isn't kernel-package wonderful?).  Thank you.
Apparently, the Seagate Barracudas 7200.7 are especially to sensitive 
heat.  So maybe these two factors together (the data center personnel 
refused to unblock the top vents on this unit) explain the kernel panics, 
and maybe I have a usable machine now :)

Mark



Reply to: