Re: apparent crashes persist.

hendrik@topoi.pooq.com writes:

> Once again, when it crashes, I can sometimes still manage to use a ssh 
> connection to get in from elsewhere.  What information should I collect, 
> and how should I analyse it?

Have you tried using completly new ram from a different vendor or
different make (e.g. single sided instead of double sided or vice
versa)? We had a 256 nodes cluster where we found that the ram was
plain incompatible and had to swap 2048 DIMMs to a different vendor to
get any stability. Even then we still had a 5% failure rate per DIMM
in a week of stress testing.


