Re: kernel hang

On Tue, 2010-10-26 at 18:39 +0200, Árpád Magosányi wrote:
> Yet another hang.
> I am on a sun netra T1. I cannot yet give you details, as the machine
> is hung some 30kms away.
When you have details it might be worth submitting a bug report if you
haven't done so already.

> What I have noticed that /proc related processes (like 'w', 'ps')
> hang, but not everything /proc related, for example I could login
> through ssh and sudo without problem.
> Strace revealed that 'ps ax' hangs when trying to
> read /proc/nnnn/cmdline (the read() have never returned), where nnnn
> is the process id of an apache.
> Kernel logs did not reveal anything related.
> (I tried to shut it down, but it also hung, and I killed processes in
> the wrong order.)
That sounds like the kind of thing that should be reasonably easy to
trace.  Do you have a test case that triggers this bug?  If so then
there is a reasonable chance it can be fixed, if not then you'll likely
have a harder time.

> I am struggling with mysterious hangs on my various suns (ultra2,
> netra T1, ...) since years.
Have you submitted bug reports?  Is it always the same problem?  If you
have serial access then you should be able to get OpenBoot to dump the
register values and the program counter should give you some hint to the
location of the problem.

>  Shall I forget sparc altogether?
> Change to Ora^H^HpenSolaris?
Without knowing more information about what the actual problems are I'm
not sure that's really answerable.  That said I might be tempted to
point out that Ultra 2's and Netra T1's are likely to be around 10 years
old, if not older and were probably designed for an operational life of
3 to 5 years, so your problem might just be hardware aging.

 - Martin

