[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Frequent system crash -- with gcc3.2 (?)

Oliver Elphick <olly@lfix.co.uk> writes:
> I am troubled by frequent hard system crashes, which seem particularly
> likely to occur when I am using gcc3.2.  Since I often want to rebuild
> large packages, this is extrememly annoying.  At the moment, it's as bad
> as Windows 95!

This is almost certainly a hardware problem.  Oddly, "gcc" is an
excellent hardware test---building the kernel or another big package a
few dozen times will frequently uncover hardware problems (as GCC
processes lock up, crash with "signal 11" errors, or otherwise behave
strangely) that more traditional memory and CPU tests miss.

> A related symptom is that gcc will freeze while compiling a particular
> object file -- not always the same one, though the same one is
> moderately likely to be picked on a subsequent run.  If I press
> control-c the compilation terminates, but an unkillable cc process is
> left in the background.
> I notice that the process is marked as running.  However I believe that
> this is false, because there are 2 other processes running (on a 2
> processor system) and I believe that there cannot be more than one per
> processor.  (Is that correct?):

The comments H. S. Teoh made are technically right.  However, when you
see an unkillable process permanently stuck in the "R" state on a dual
processor machine, it's often because one of the processors went off
into user space and never came back.  If, when this happens, you "cat
/proc/interrupts" a few times, you'll discover that one of the columns
of numbers remains fixed---that processor has simply died.

If the processor dies in the kernel while holding a lock, the other
processor will eventually freeze waiting for that lock.  Depending on
how this happens, the waiting processor will normally still respond to
interrupts, so you'll be able to use Alt-SysReq keys and such.  If you
try Alt-SysReq-B, it'll eventually hang as the kernel tries to regain
control of the other (dead) processor.

Try running your compilations with only one of the two processors
installed.  You may find that one of them is flakey.

Kevin <buhr@telus.net>

Reply to: