Now I've had a couple of personal reports of QEMU users experiencing
hard lockups during periods of high I/O under Debian wheezy similar to
the thread at
http://comments.gmane.org/gmane.linux.debian.ports.sparc/16093 (QEMU
emulation is an approximation of the Ultra 5 machine) which makes me
believe that this could be a kernel bug.
At one point I was given remote access to a system that could reproduce
the issue about 1 in every 10 boots, and from attaching a debugger to
the QEMU session it looked like the wheezy kernel was stuck in a
spinlock, likely waiting for an interrupt that never arrived. It could
of course be possible that this is an emulation bug, but it does seem
suspicious that the same problem occurs also seems present on a real
Ultra 5 too.
If I had to hazard a guess then I would say that it's related to either
the older UltraSPARC processors or the PCI/psycho interrupt handling in
the kernel which may explain why the kernel developers who likely have
much newer hardware aren't noticing the issue.
Sadly as I struggle to reproduce this issue locally then it's almost
impossible for me to say whether this is something that affects all
SPARC64 kernels or just a specific hardware combination, but I'd be very
interested to hear back as to whether you also experience any lockups
during high I/O during your testing with more recent kernels
(particularly virtio seems to help trigger the issue under emulation).
ATB,
Mark.