Re: IDE Hard Drive maintenance

To: Lauchlin Wilkinson <lauchlin_list@intas.net.au>
Cc: Debian ISP <debian-isp@lists.debian.org>
Subject: Re: IDE Hard Drive maintenance
From: Michel Lanners <michel.lanners@Cetrel.LU>
Date: Tue, 06 May 2003 17:46:02 +0200
Message-id: <[🔎] 3EB7D8BA.C9AFBFBC@cetrel.lu>
References: <1041910293.21153.15.camel@snafu.intas.office>

This electronic message is not binding on its sender nor on Cetrel S.C.
Any use of information of this mail except the use by the addressee
within his or her business relation with Cetrel is strictly forbidden
CETREL S.C. L-2956 Luxembourg; Tel: 00352 35566-1; http://www.cetrel.lu
======================================================================= 

Sorry for replying to such old mails, but I'm cleaning my mailboxes
:-)

Lauchlin Wilkinson wrote:
> I was wondering what most people on the list did when it came to keeping
> tabs on the health of IDE hard drives?  I have a server in  a remote
> location that I fear has one HD that is going flaky. Is there a way of
> doing a bad block scan on a mounted partition safely or am I asking the
> impossible.

For monitoring, use SMART as in smartmontools (it's in unstable, but
you can recompile easily for stable). It's a more advanced version of
smartsuite in stable.

Some things to watch out for:

- enable SMART on your drives. Some may have it disabled by default.
- enable automatic offline tests. These are non-destructive and
non-captive, i.e. they can run in the background. On not-too-busy
servers, this load on the disk should not be a problem. YMMV. Don't
know whether it's the same for all disks, but I have one at home that
does tests every 4 hours.
- configure smartd to send email on problems.
- have a watch on the SMART error log on the drives.

If something apears in the logs, you will also see the block address
in there, but it might not be obvious to associate that to a
filesystem or devic block as you see it 'from the outisde'. You can
run badblocks to find that info.

Also be carefull when translating block numbers. badblocks will report
filesystem blocks (thei size is in the superblock), the kernel log
will show device blocks (i.e. 512-byte blocks), and something else I
forget will show 1K blocks. Just be sure you get it right....

A trick to make disks with isolated read errors behave again (although
you shouldn't trust them too much important data...) is to _write_ to
those blocks. This will make the drive controller remap those bad
blocks to good spare blocks. Bingo, errors disappeared. Until the next
one appears :-)

Cheers

Michel

-- 

Michel Lanners              | "Being able to break security
PRO-SSC                     |  doesn't make you a hacker
michel.lanners@cetrel.lu    |  more than being able to hotwire cars
Cetrel S.C.                 |  makes you an automotive engineer."
10, Parc d'Activite Syrdall |
L-5365 Munsbach             |               Eric S. Raymond

Reply to:

Prev by Date: RE: Which webmail do you prefer? Why?
Next by Date: Re: daily apache-ssl reload is causing probs (FIXED)
Previous by thread: RE: Which webmail do you prefer? Why?
Next by thread: Re: daily apache-ssl reload is causing probs (FIXED)
Index(es):
- Date
- Thread