Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]

On 3/14/21 5:52 PM, John Paul Adrian Glaubitz wrote:
> On 3/14/21 6:48 PM, Frank Scheiner wrote:
>>> So, if, for example, you want to verify that the memory is okay, you should run
>>> a memtest program.
>> ...the built-in (memory) diagnostics of Sun machines are pretty
>> thorough. This is not a PC. :-)
> I doubt that the hardware runs a thorough memory test by default that
> can be compared to a full memtest86 test run.

The probability that there is a memory hardware fault after the ECC
memory tests done during POST would be very very low. So close to zero
that I can not even begin to guess how a memory fault would slip past
those ECC diagnostics.  Those run for quite a while and I have never
seen evidence that there was a problem.

    See : https://lists.debian.org/debian-sparc/2021/03/msg00026.html

Regardless we are just going in circles.

I don't know if this is a kernel problem or what. I only know that
something goes terribly wrong and it may be a systemd related problem.

I think Frank Scheiner made some suggestions and I will go and give a
try at isolating the issue.

> Either way, if the kernel breaks for someone, they will have to bisect the
> issue. I don't have any means in bisecting a problem if I cannot reproduce
> it in the first place.

I agree completely.


