[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#678443: Hard lockups due to "lockup-detector" (NMIs) on muti-Pentium-3 SMP systems on all kernel builds since 2.6.38



Hello,

Ben Hutchings wrote:
On Thu, 2012-06-21 at 21:25 +0200, Hans-Juergen Mauser wrote:
[...]
Here we see
again how bad the documentation of open-source projects sometimes is
cared about... even when configuring a kernel, the config help says that
the nmi watchdog had to be enabled consciously by a boot parameter

I don't see any documentation saying that; maybe you're looking at the
wrong version.  But thanks for the general criticism anyway, it really
helps to motivate developers.

Sorry, that wasn't meant negatively. II know it from my own work that it happens - but on the other hand, as a linux enthusiast, I am often asking myself how an "average" user should be able to handle this. And, you are right, I mixed up two locations: in the current kernel source the config help is correct, but the information files are still partly wrong, and that's where I took it from:

http://www.kernel.org/doc/Documentation/nmi_watchdog.txt

- in
fact it seems to be activated by default as soon as SMP code is loaded
and/or an APIC is detected (but though the presence of an APIC, I have
not seen those NMIs on my uniprocessor P3 machines yet).

It actually depends on whether the processor has a PMU (performance
monitoring unit) with a useful counter.

Okay, found at least one system which _does_ "count NMIs" - just for learning I will take a look at the differences between the systems and running kernel versions/configurations.

[...]

I think it's fine and has nothing to do with the problem.

Since you say it has taken 1-8 days for any problem to appear, I suppose
you will have to wait a few weeks to have some confidence that
'nowatchdog' makes a difference.

That's what I like to do and also will do, there won't be any other reason to reboot the machine which gets hit by the problem most often. As soon as a definite difference (or definitely the same behaviour) is visible, I will post a reply here. Anyway I just liked to be able to discuss the problem and initially posted it as a reply to the bug referenced above, but a hint was given that I should open a new one.

At least the bug has sone one good thing to me: I got used again to compile my own kernels which I had abandoned with the advent of the 2.6 series and the change in most distributions to initrds, which made me use only pre-packaged binaries for consistence among a number of machines and simplicity. Now I am happy again to be able to optimise some details again or choose other options than the distribution team.

Best regards,

Hans-Juergen



Reply to: