[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#433187: [Fwd: Re: Fix for sparc64 cpu hangs.]



Dann,

I'll build the snapshot and let some machines run with it.

>  Can you provide reproduction instructions and/or verify the fix I've
> committed? A snapshot[1] build with this fix should appear within 24
> hours - you'll need a build >= r9705.
> 
> [1] http://wiki.debian.org/DebianKernel
>     http://stats.buildserver.net/packages/status.php?email=debian-kernel&packages=&arches=&subdist=kernel-dists
> 

There're a few ways to reproduce the bug - unfortunately you can't tell
when you'll hit the bug, it was happening at kinda random times. The
best way for me is described in
http://www.mail-archive.com/sparclinux@vger.kernel.org/msg02027.html
If you run stress -c <number of cpus> at the same time it'll hit you
even worse.

Unfortunately it also seems to depend on the CPU type, I never manged to
reproduce it on US II - you just got hit by it after $RANDOM days,
usually dpkg-query being involved. Small US III machines seem to be
affected more, and it's just worse on machines with >= 4 US III. But
even there you get hit on a random time.


Just in case the bug is not fixed completely, it would probably make
sense to add the sysrq-g patches from
http://www.mail-archive.com/sparclinux@vger.kernel.org/msg02019.html
and
http://www.mail-archive.com/sparclinux@vger.kernel.org/msg02022.html
I don't know if they affect anything else, though.


I've applied David's patch a few minutes after he posted it, the machine
is running fine since that day, and even under high load I didn't manage
to crash it again. So at least it makes things _MUCH_ better.


-- 
Bernd Zeimetz
<bernd@bzed.de>                         <http://bzed.de/>



Reply to: