[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#644550: indefinite soft lockup on rm



> > Attached.  It makes no sense for me.  And it doesn't seem to contain the _CST
> > table, maybe because these being disabled in BIOS?
> 
> Yep, I meant acpidump with c-states enabled.  But don't worry about it.

It's at

http://joni.heaven-industries.com/~egon/tornado/acpidump-cst-enabled.txt

(much bigger now with C-states enabled)

> Here's the best article on such trouble I can find from a quick
> search: [1].  Do you happen to know the name of the erratum, or an
> Intel document describing it?

No.  Nothing else an ordinary guy may find on web:

http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-8800-4800-2800-families-specification-update.html

...there are quite a few mentioning C-states.

> I don't see any relevant fixes upstream recently, but please confirm
> the problem with a 3.2 release candidate from experimental, and then
> we should take this upstream (that means the linux-pm@vger.kernel.org
> list, cc-ing Len Brown <lenb@kernel.org>,
> linux-kernel@vger.kernel.org, and either me or this bug log so we can
> track it).

Actually, 3.2.0rc4 seems to run well!  The big (and relevant) change since
the squeeze kernel is that the new has the intel_idle driver (which kicks in
instead of the acpi_idle):

(2.6.32)
root@tornado:/sys/devices/system/cpu/cpu0/cpuidle# grep . */*
state0/desc:CPUIDLE CORE POLL IDLE
state0/latency:0
state0/name:C0
state0/power:4294967295
state0/time:735633
state0/usage:85
state1/desc:ACPI FFH INTEL MWAIT 0x0
state1/latency:1
state1/name:C1
state1/power:1000
state1/time:457743320
state1/usage:49228
state2/desc:ACPI FFH INTEL MWAIT 0x10
state2/latency:64
state2/name:C2
state2/power:500
state2/time:86053513
state2/usage:47278
state3/desc:ACPI FFH INTEL MWAIT 0x20
state3/latency:96
state3/name:C3
state3/power:350
state3/time:1759785634
state3/usage:365191

(3.2.0rc4)
root@tornado:/sys/devices/system/cpu/cpu0/cpuidle# grep . */*
state0/desc:CPUIDLE CORE POLL IDLE
state0/latency:0
state0/name:POLL
state0/power:4294967295
state0/time:1985644
state0/usage:9221
state1/desc:MWAIT 0x00
state1/latency:3
state1/name:C1-NHM
state1/power:4294967294
state1/time:411021632
state1/usage:9459157
state2/desc:MWAIT 0x10
state2/latency:20
state2/name:C3-NHM
state2/power:4294967293
state2/time:1632399690
state2/usage:2649168
state3/desc:MWAIT 0x20
state3/latency:200
state3/name:C6-NHM
state3/power:4294967292
state3/time:51728887664
state3/usage:8613936

...and it possibly takes care somehow better, even when it (as it seems)
allows the CPU to enter the deeper (and Nehalem/Westmere problematic?) states
too.  However, I am able to trigger the lockup just once a day, so it would
need more testing.  So far I only know that the experimental kernel survived
once the same test case that crashes the squeeze kernel reliably (with no
kernel cmdline tweaks regarding idle behavior).

So regarding an eventual 2.6.32.x patch, I'm not sure what to think of about,
an ACPI blacklist update perhaps?

Now I'm switching back to stable kernel (as it's the one the company wants to
run in the long term) and I will try to confirm the processor.max_cstate
acpi_idle driver cmdline option workaround.  I'll be back.

Thanks,

Egon



Reply to: