[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#648766: [sparc] BUG: NMI Watchdog detected LOCKUP on CPU0



So what have a learned after lots of test cases.

With SMP on or off, and nouveau driver loaded or not I have the same unstable behavior and crashing on linux kernel 3.2.13, 3.2.14, 3.3.1. All test involved with only one CPU plugged in, both CPUs plugged in, with SMP on and off, with the NVIDIA graphics card plugged in and on, the XVR-1200 graphics card plugged in and on, and no graphics card at all. Still getting the same errors occurring. I can confirm that the system is very likely to crash when the hard drive is read from heavily. The system can do memory tests for 16+ hours hours utilizing 100% of both CPUs without error, but the moment you do a cat /dev/sda > /dev/null you will normally see the kernel panic with in a few minutes.

I have one more idea for a test that involves plugging in an PATA disk drive and loading linux on that and see if these kernel versions still crash. As the current hard drives are SCSI.

Now with that said I can't seem to crash the 2.6.32 kernel in the same way with SMP off, haven't tried with SMP on yet, but I have a feeling that will work fine as well. So this seems like it's some sort of regression in the linux kernel. Which is very sad. I have a lot more images of various kernel panic traces but they are all very similar to the ones already posted. I am going to start looking in to what changes where made to the Sparc specific parts of the kernel since 2.6.32, and try to isolate something.

This seems like a real kernel bug.

-Kieron

On 04/03/2012 01:29 PM, Jonathan Nieder wrote:
Kieron Gillespie wrote:

Here are the dmesg output from the current system running Linux
3.2.13 with SMP enabled with tickless disabled.
Great.

Is this reproducible without nouveau?   It might be possible to test
by putting

	blacklist nouveau

in /etc/modprobe.d/kg-disable-nouveau.conf and booting in "recovery
mode" so X doesn't get started.  That might mean it continues to
access the console using the PROM or it might mean there is no
console output at all and one has to operate "blind" or using ssh;
please forgive my ignorance.

Thanks,
Jonathan




Reply to: