[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Lost Interupt and Dead HD



Hi folks, and happy new year!

We have seen people posting about the kernel error message of
Lost_Interupt (or something similar) reasonably regularly, and then
people usually post back saying 'your HD is about to die, replace it',
and everyone then goes back on with their lives.

I too have had this error, and assumed that the HD was just going bad
(which was confirmed by doing a BIOS low-level format). No-one likes it
when a HD dies, but everything has a useful life-span.

However, I recently installed debian onto a machine, where this error
again showed up, twice, on different drives (on oldish, one brand new,
both from different manifacturers). The larger, newer drive was the
first to go - the drive made weird noises, spouted the error lots of
times. When I got around to running a fsck on the drive, most of the
contents had ended up in lost+found, and I basically had to write the
installation off.

I tried to convice the owner of said computer that it might have been a
faulty drive (there is an identical one running in a Mandrake 6.1
machine which has had no problems). I then got the smaller, older drive
in the same machine, and reinstalled onto that (I have been using this
smaller drive as the root drive, with /home and /var on the larger,
newer drive). After a few hours of reinstalling everything (the machine
had not yet gone into production, so there was no backup), and
installing the latest kernel (2.2.13, 2.2.12 had been running when the
drive died), everything seems håppy.

I get back from my new-years camping trip, and about an hour later I get
a call to say that the machine has died again, this time with the other
(older drive). Same symptoms - the owner turned off the computer when
the drive went nuts (those who had heard it know it's not a pretty sound
to hear from a delicate piece of electronics!), and although I have yet
to see it in person, I'm not going to surprised if this drive is cactus
as well (this time we have a backup).

So, what do we have. Two drives - different sizes, different ages,
different manifacturers, one the primary master, the other the secondary
master. Two different kernels, both with CONFIG_IDEDMA_AUTO=n (which
some have suggested might help), both with all the various IDE chipset
workarounds enabled. An extremely vanilla installation of debian 2.1
(with all the latest official add-ons, and non of the non-offical ones).
We have an identical drive working flawlessly in both other linux and
windows NT machines. The same machine (with same drives) also ran with
no problems with an NT installation). In both cases, the problem
manifested after the machines had been running 24/7 for a few weeks (not
sure exacly how long, but at least 14 days).

It is *possible* that both drives were faulty and about to die anyway,
but it looks very unlikely to me that they would both die in the same
machine in the same way.

Unfortunately I don't have the make and model of the motherboard with
me, but it was running a Cyrix 266 chip in a fairly generic motherboard,
the same combination we run in other machines with no problems that I am
aware of. If I had to guess, I'd say that maybe the kernel didn't like
the IDE controller (don't know make/model again), but it sounds like a
pretty lame excuse when other OSs didn't have any problem with it.

This machine was to be our new main server, running mail, dns, web, ppp,
firewall, all the mod cons. I managed to successfully argue running
debian, because if I was administering it, I wanted something I knew
well. Of course, since we have never had a problem with any of our other
RedHat or Mandrake boxes, Debian is being singled out as the culprate.
I'm being told I should install RedHat, and forget debian, as it's the
cause of all the woes in the world. I'd be very suprised if anything
partiular in debian was the problem, more likely to be a kernel issue, I
would think, which means it's distribution independent. But if I don't
come up with a solution soon, it's going to be back to redhat (or worse
still, NT)...

Switch MBs/Machines might be a solution, but the sad fact is that if I
have to use a new MB, I'm going to be going back to a P100 or something,
which is not an accpetable solution, as far as I'm concerned.

I *know* this issue has come up before, and I'm pretty sure no-one has
suggested a plausible solution other than 'dump the hardware'. Should I
just swap motherboards, go back to an underpowered machine (yes, it's
all relative, I know, but I've had to fight to get good hardware for the
linux servers)? Is there a chance it's debian related?

Any suggestions will be greatly appreciated.

cheers,

damon

-- 
Damon Muller (dm-sig6@empire.net.au) /  It's not a sense of humor.
* Criminologist                     /  It's a sense of irony
* Webmeister                       /  disguised as one.
* Linux Geek                      /     - Bruce Sterling 


Reply to: