Re: OT: IDE/Bad Blocks: Call for mental assistance

To: debian users <debian-user@lists.debian.org>
Subject: Re: OT: IDE/Bad Blocks: Call for mental assistance
From: Alex Malinovich <demonbane@the-love-shack.net>
Date: 13 Jul 2003 12:03:01 -0500
Message-id: <[🔎] 1058115781.7637.25.camel@Thief>
In-reply-to: <[🔎] 20030713115123.GA24224@diamond.madduck.net>
References: <[🔎] 20030713115123.GA24224@diamond.madduck.net>

On Sun, 2003-07-13 at 06:51, martin f krafft wrote:
> Folks,
> 
> Over the past year, I have replaced something around 20 IDE
> Harddrives in 5 different computers running Debian because of drive
> faults. I know about IDE and that it's "consumer quality" and no
> more, but it can't be the case that the failure rate is that high.

I've been using IDE drives almost exclusively for over a decade on all
of my machines with no problems. This includes 3 servers which were up
for 12+ months at a time. I also ran a single server using a SCSI drive
a couple of years ago. The SCSI drive is that only one that's ever died
in a server box.

I have replaced 4 IBM drives in 3 years due to hardware faults cropping
up after a year or so, but the majority of the time the drive was in a
very dirty, very poorly ventilated machine.

--snip--
> The other day, I received a replacement drive from Hitachi, plugged
> it into a test machine, ran badblocks and verified that there were
> no badblocks. I then put the machine into a firewall, sync'd the
> data (ext3 filesystems) and was ready to let the computers be and
> head off to the lake... when the new firewall kept reporting bad
> reloc headers in libraries, APT would stop working, there would be
> random single-letter flips in /var/lib/dpkg/available (e.g. swig's
> Version field would be labelled "Verrion"), and the system kept
> reporting segfaults. I consequently plugged the drive into another
> test machine and ran badblocks -- and it found more than 2000 -- on
> a drive that had non the day before.

This is definitely a Bad Thing (tm). :) Getting 10 or 100 bad blocks
might not be that big of a concern (though having it happen on the same
day would concern me), but 2000 is quite serious.

> Just now, I got another replacement from Hitachi (this time it
> wasn't a "serviceable used part", but a new drive), and out of the
> box, it featured 250 bad blocks.
> 
> My vendor says that bad blocks are normal, and that I should be
> running the IBM drive fitness test on the drives to verify their
> functionality. Moreover, he says that there are tools to remap bad
> blocks.
> 
> My understanding was that EIDE does automatic bad sector remapping,
> and if badblocks actually finds a bad block, then the drive is
> declared dead. Is this not the case?

Some drives can, in fact do bad sector remapping on the fly. However,
manually finding bad blocks on a drive is no real cause for concern.
When a bad sector is found, it should be marked as such and the FS
should not use it afterwards.

--snip--
> So I call to you and would like to know a couple of things:
> 
>   - does anyone else experience this?
>   - does anyone know why this is happening?
>   - is it true that bad blocks are normal and can be handled
>     properly?
>   - why is this happening to me?

Lunar phases, astronomic alignment, a discontented former associate
turned witch doctor, a novice voodoo practitioner practicing on HD dolls
instead of human ones, or maybe sunspots. Why do any bad things happen
to any good people? :)

>   - can bad blocks arise from static discharge or impurities? when
>     i replace disks, I usually put the new one into the case
>     loosely and leave the cover open. The disk is not subjected to
>     any shocks or the like, it's sitting still as a rock, it's just
>     not affixed.

Bad blocks can occur due to static discharge and/or impurities, however
what I'd be looking at is not affixing the drive to the case. Elementary
physics tells us that any vibrations produced by the drive will
primarily be reflected back at it rather than dissipated throughout the
computer chassis. I am not about to cast the proverbial first stone,
however, since I have done the same thing at times. However, in my
experience, I've only done it with drives that were dying anyway, and
they always seemed to have an accelerated rate of death from that point
on. Again, I can't state this as absolute truth, but just in my own
personal experience.

The other thing that I would be looking at is possibly your motherboard
or (if you are using one) PCI IDE controller. You could conceivably be
running a system that is through some combination of hardware and
software chewing up your HDs. 

IDE drives can be a great alternative for companies looking to save
money. I have a friend who's employer recently purchased 8
consumer-grade PC's to replace 8 aging Sun servers, saving somewhere
around $10,000 (US) per machine. They then put in huge RAID-5 arrays and
went to work, with no problems.

p.s. As a side note, I've been using IBM drives exclusively for about 2
years now, and for the last year or so I've been running ReiserFS on all
of them. I don't know if this is still the case, but I know that at one
point ReiserFS would choke upon encountering a bad sector. To this day,
I have yet to have a single problem.

-- 
Alex Malinovich
Support Free Software, delete your Windows partition TODAY!
Encrypted mail preferred. You can get my public key from any of the
pgp.net keyservers. Key ID: A6D24837

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Re: OT: IDE/Bad Blocks: Call for mental assistance
  - From: Alan Shutko <ats@acm.org>

References:
- OT: IDE/Bad Blocks: Call for mental assistance
  - From: martin f krafft <madduck@debian.org>

Prev by Date: What do you suggest ?
Next by Date: PS1 in .bashrc
Previous by thread: Re: OT: IDE/Bad Blocks: Call for mental assistance
Next by thread: Re: OT: IDE/Bad Blocks: Call for mental assistance
Index(es):
- Date
- Thread