[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Help needed fixing kernel errors



Hi folks,
I have a problem that's now beyond my expertise to fault properly. I get
random intermittent kernel errors. Usually when the system is under
stress.

System specs;
AMD X4 840 (Badged phenomii but it's really an athlon core)
ASUS M4A88TD-M EVO/USB3
2x 2GB sticks of Corsair 1600 DDR3
500TB WD Caviar Blue.

Below are some example of the errors.

square kernel: [  683.271626] Pid: 6593, comm: rsync Tainted: P      D
2.6.32-5-amd64 #1
Apr 24 14:51:38 square kernel: [  683.271631] Call Trace:
Apr 24 14:51:38 square kernel: [  683.271648]  [<ffffffff810cad37>] ?
print_bad_pte+0x232/0x24a
Apr 24 14:51:38 square kernel: [  683.271660]  [<ffffffff810cbde7>] ?
unmap_vmas+0x62d/0x931
Apr 24 14:51:38 square kernel: [  683.271672]  [<ffffffff8118e194>] ?
cpumask_any_but+0x28/0x34
Apr 24 14:51:38 square kernel: [  683.271682]  [<ffffffff810d04c4>] ?
exit_mmap+0xc4/0x148
Apr 24 14:51:38 square kernel: [  683.271690]  [<ffffffff8104bc6d>] ?
mmput+0x3c/0xdf
Apr 24 14:51:38 square kernel: [  683.271698]  [<ffffffff8104f866>] ?
exit_mm+0x102/0x10d
Apr 24 14:51:38 square kernel: [  683.271705]  [<ffffffff8105128b>] ?
do_exit+0x1f8/0x6c6
Apr 24 14:51:38 square kernel: [  683.271712]  [<ffffffff810517cf>] ?
do_group_exit+0x76/0x9d
Apr 24 14:51:38 square kernel: [  683.271720]  [<ffffffff81051808>] ?
sys_exit_group+0x12/0x16
Apr 24 14:51:38 square kernel: [  683.271727]  [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b
Apr 24 14:51:44 square kerneloops: Submitted 1 kernel oopses to
www.kerneloops.org

Another from minecraft;

d: 6742, comm: java Tainted: P    B D    2.6.32-5-amd64 #1
Apr 24 15:12:02 square kernel: [ 1907.726033] Call Trace:
Apr 24 15:12:02 square kernel: [ 1907.726039]  [<ffffffff810b7a11>] ?
bad_page+0x116/0x129
Apr 24 15:12:02 square kernel: [ 1907.726042]  [<ffffffff810b9b2e>] ?
get_page_from_freelist+0x4fd/0x760
Apr 24 15:12:02 square kernel: [ 1907.726098]  [<ffffffffa0246f02>] ?
firegl_trace+0x72/0x1e0 [fglrx]
Apr 24 15:12:02 square kernel: [ 1907.726100]  [<ffffffff810ba0f8>] ?
__alloc_pages_nodemask+0x11c/0x5f4
Apr 24 15:12:02 square kernel: [ 1907.726104]  [<ffffffff81036605>] ?
native_flush_tlb_others+0xb6/0xe3
Apr 24 15:12:02 square kernel: [ 1907.726107]  [<ffffffff810bc479>] ?
____pagevec_lru_add+0x160/0x176
Apr 24 15:12:02 square kernel: [ 1907.726110]  [<ffffffff810cc981>] ?
handle_mm_fault+0x27a/0x80f
Apr 24 15:12:02 square kernel: [ 1907.726113]  [<ffffffff812fe6b6>] ?
do_page_fault+0x2e0/0x2fc
Apr 24 15:12:02 square kernel: [ 1907.726116]  [<ffffffff812fc555>] ?
page_fault+0x25/0x30

Another one from stress.

stress        D 0000000000000000     0  5972   5963 0x00000000
Apr 25 21:16:11 square kernel: [  360.740389]  ffff88011b04dbd0
0000000000000082 ffff880114f40150 000000000000000e
Apr 25 21:16:11 square kernel: [  360.740392]  0007ffffffffffff
0000000000000000 000000000000f9e0 ffff880100329fd8
Apr 25 21:16:11 square kernel: [  360.740395]  0000000000015780
0000000000015780 ffff88011b04f100 ffff88011b04f3f8
Apr 25 21:16:11 square kernel: [  360.740397] Call Trace:
Apr 25 21:16:11 square kernel: [  360.740404]  [<ffffffff8104001f>] ?
check_preempt_wakeup+0x1dd/0x268
Apr 25 21:16:11 square kernel: [  360.740408]  [<ffffffff812fb65b>] ?
__mutex_lock_common+0x122/0x192
Apr 25 21:16:11 square kernel: [  360.740411]  [<ffffffff810493e0>] ?
update_rq_clock+0xf/0x28
Apr 25 21:16:11 square kernel: [  360.740413]  [<ffffffff812fb783>] ?
mutex_lock+0x1a/0x31
Apr 25 21:16:11 square kernel: [  360.740416]  [<ffffffff8110be35>] ?
sync_filesystems+0x13/0xe3
Apr 25 21:16:11 square kernel: [  360.740418]  [<ffffffff8110bf4a>] ?
sys_sync+0x1c/0x2e
Apr 25 21:16:11 square kernel: [  360.740420]  [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b
Apr 25 21:18:11 square kernel: [  480.740375] stress        D
ffff8800cf609c40     0  5965   5963 0x00000000
Apr 25 21:18:11 square kernel: [  480.740378]  ffff8800cf609c40
0000000000000086 ffffffff810414d5 000000010000000e
Apr 25 21:18:11 square kernel: [  480.740381]  0000000000015780
ffff880100383e68 000000000000f9e0 ffff880100383fd8
Apr 25 21:18:11 square kernel: [  480.740383]  0000000000015780
0000000000015780 ffff8800cf60f100 ffff8800cf60f3f8
Apr 25 21:18:11 square kernel: [  480.740385] Call Trace:
Apr 25 21:18:11 square kernel: [  480.740392]  [<ffffffff810414d5>] ?
select_task_rq_fair+0x472/0x836
Apr 25 21:18:11 square kernel: [  480.740395]  [<ffffffff8101650e>] ?
native_sched_clock+0x2e/0x66
Apr 25 21:18:11 square kernel: [  480.740397]  [<ffffffff8103fc8e>] ?
update_curr+0xa6/0x147
Apr 25 21:18:11 square kernel: [  480.740399]  [<ffffffff8101654b>] ?
sched_clock+0x5/0x8
Apr 25 21:18:11 square kernel: [  480.740402]  [<ffffffff812fb65b>] ?
__mutex_lock_common+0x122/0x192
Apr 25 21:18:11 square kernel: [  480.740404]  [<ffffffff812fb783>] ?
mutex_lock+0x1a/0x31
Apr 25 21:18:11 square kernel: [  480.740407]  [<ffffffff8110be35>] ?
sync_filesystems+0x13/0xe3
Apr 25 21:18:11 square kernel: [  480.740409]  [<ffffffff8110bf40>] ?
sys_sync+0x12/0x2e
Apr 25 21:18:11 square kernel: [  480.740411]  [<ffffffff81010b42>] ?
system_call_fastpath+0x16/0x1b

My attempts at troubleshooting this have been like so;

1) Compile kernels and flightgear. Usually fails after 10 mins or so.
2) Remove one mem stick, swap with other. Try different slots. It fails
"less often" with one stick than with both. 
3) Memtest86+ shows both sticks to be ok.
4) Ran "stress". This fails more often if I enable hdd tests but it
still fails.
5) Installed fedora to prove it's not just a Debian thing. Errors are
the exact same under fedora.

I'm at a loss as to what it could be and would like to determine at
least something before I start throwing money around. All I have left if
that some incompatibility between mobo/mem/cpu/disk is causing this.

Does anyone have any advice on what tools I can use to narrow it down
more or eliminate certain components?

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: