[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to diagnose kernel panic?



Hi Mark.

	I don't know if this kind of information will help out at all or not, but 
what are the specs of your machine? Specifically, do you have a quality power 
supply? How about your hard drive and your motherboard? As I said, I don't 
know if answering these questions will reveal anything important, but it 
always helps to verify that you are using quality parts in your machine. 

	After all, a software program is just a collection of assembly instructions 
to your CPU (usually compiled from a high-level language, such as C++). If a 
piece of software executes an assembly instruction that addresses a hard disk 
for information and if the motherboard and/or the hard disk are cheapies and 
they fail to properly return whatever data the assembly instruction was 
expecting, that certainly cause software bugs ranging from incorrect display 
of data to kernel panics, depending on the program that gets lucky (cheap 
motherboards and hard disks are cheap because they have less redundancy, 
fault-tolerance, and use components more likely to fail to begin with). Also, 
if your power supply is a cheap one, it might not be supplying enough power 
to your computer and if that happens, well, your computer just won't work 
correctly because both your software and hardware expect full power in order 
to work correctly. 

	Hope all that helps.


On Sunday 09 July 2006 12:56, Mark Copper wrote:
> I have a server that is brought down by a kernel panic every two weeks
> on average.  Nothing untoward gets in the logs and the on-screen panic
> message starts with something like
>    Kernel panic - not syncing: Fatal exception in interrupt
>
>    Call trace:
>    [<c026bc42>] scsi_request_fn+0xf610x294
> I wasn't able to get any more at the data center...
>
> So I brought the machine home and am running folding@home on it and so
> far I have not been able to induce the panic.  The replacement machine
> is similar, but not identical.  The main difference being a switch from
> software to hardware RAID1.  Also, the new machine, except for the
> hardware driver, uses stable while the problematic machine uses testing.
> And the replacement has run so far without problem.
>
> The only other thing I can add is that the bad machine would seem to
> start getting "sluggish" before it froze, but for the life of me, I
> couldn't see why.
>
> I am posting because I'm hopeful that list participants might have
> suggestions how I might start to chase down or, better yet, eliminate
> this problem.
>
> Is there a way, perhaps, to manufacture the possible interrupts that
> occur?
>
> Thanks.
>
> Mark



Reply to: