Re: Help needed fixing kernel errors
On Tue, 26 Apr 2011 14:37:24 +1000, Steven wrote in message
<[🔎] 1303792644.6192.14.camel@square>:
> Hi folks,
> I have a problem that's now beyond my expertise to fault properly. I
> get random intermittent kernel errors. Usually when the system is
> under stress.
>
> System specs;
> AMD X4 840 (Badged phenomii but it's really an athlon core)
> ASUS M4A88TD-M EVO/USB3
> 2x 2GB sticks of Corsair 1600 DDR3
> 500TB WD Caviar Blue.
>
> Below are some example of the errors.
>
> square kernel: [ 683.271626] Pid: 6593, comm: rsync Tainted: P D
> 2.6.32-5-amd64 #1
> Apr 24 14:51:38 square kernel: [ 683.271631] Call Trace:
> Apr 24 14:51:38 square kernel: [ 683.271648] [<ffffffff810cad37>] ?
> print_bad_pte+0x232/0x24a
> Apr 24 14:51:38 square kernel: [ 683.271660] [<ffffffff810cbde7>] ?
> unmap_vmas+0x62d/0x931
> Apr 24 14:51:38 square kernel: [ 683.271672] [<ffffffff8118e194>] ?
> cpumask_any_but+0x28/0x34
> Apr 24 14:51:38 square kernel: [ 683.271682] [<ffffffff810d04c4>] ?
> exit_mmap+0xc4/0x148
> Apr 24 14:51:38 square kernel: [ 683.271690] [<ffffffff8104bc6d>] ?
> mmput+0x3c/0xdf
> Apr 24 14:51:38 square kernel: [ 683.271698] [<ffffffff8104f866>] ?
> exit_mm+0x102/0x10d
> Apr 24 14:51:38 square kernel: [ 683.271705] [<ffffffff8105128b>] ?
> do_exit+0x1f8/0x6c6
> Apr 24 14:51:38 square kernel: [ 683.271712] [<ffffffff810517cf>] ?
> do_group_exit+0x76/0x9d
> Apr 24 14:51:38 square kernel: [ 683.271720] [<ffffffff81051808>] ?
> sys_exit_group+0x12/0x16
> Apr 24 14:51:38 square kernel: [ 683.271727] [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> Apr 24 14:51:44 square kerneloops: Submitted 1 kernel oopses to
> www.kerneloops.org
>
> Another from minecraft;
>
> d: 6742, comm: java Tainted: P B D 2.6.32-5-amd64 #1
> Apr 24 15:12:02 square kernel: [ 1907.726033] Call Trace:
> Apr 24 15:12:02 square kernel: [ 1907.726039] [<ffffffff810b7a11>] ?
> bad_page+0x116/0x129
> Apr 24 15:12:02 square kernel: [ 1907.726042] [<ffffffff810b9b2e>] ?
> get_page_from_freelist+0x4fd/0x760
> Apr 24 15:12:02 square kernel: [ 1907.726098] [<ffffffffa0246f02>] ?
> firegl_trace+0x72/0x1e0 [fglrx]
> Apr 24 15:12:02 square kernel: [ 1907.726100] [<ffffffff810ba0f8>] ?
> __alloc_pages_nodemask+0x11c/0x5f4
> Apr 24 15:12:02 square kernel: [ 1907.726104] [<ffffffff81036605>] ?
> native_flush_tlb_others+0xb6/0xe3
> Apr 24 15:12:02 square kernel: [ 1907.726107] [<ffffffff810bc479>] ?
> ____pagevec_lru_add+0x160/0x176
> Apr 24 15:12:02 square kernel: [ 1907.726110] [<ffffffff810cc981>] ?
> handle_mm_fault+0x27a/0x80f
> Apr 24 15:12:02 square kernel: [ 1907.726113] [<ffffffff812fe6b6>] ?
> do_page_fault+0x2e0/0x2fc
> Apr 24 15:12:02 square kernel: [ 1907.726116] [<ffffffff812fc555>] ?
> page_fault+0x25/0x30
>
> Another one from stress.
>
> stress D 0000000000000000 0 5972 5963 0x00000000
> Apr 25 21:16:11 square kernel: [ 360.740389] ffff88011b04dbd0
> 0000000000000082 ffff880114f40150 000000000000000e
> Apr 25 21:16:11 square kernel: [ 360.740392] 0007ffffffffffff
> 0000000000000000 000000000000f9e0 ffff880100329fd8
> Apr 25 21:16:11 square kernel: [ 360.740395] 0000000000015780
> 0000000000015780 ffff88011b04f100 ffff88011b04f3f8
> Apr 25 21:16:11 square kernel: [ 360.740397] Call Trace:
> Apr 25 21:16:11 square kernel: [ 360.740404] [<ffffffff8104001f>] ?
> check_preempt_wakeup+0x1dd/0x268
> Apr 25 21:16:11 square kernel: [ 360.740408] [<ffffffff812fb65b>] ?
> __mutex_lock_common+0x122/0x192
> Apr 25 21:16:11 square kernel: [ 360.740411] [<ffffffff810493e0>] ?
> update_rq_clock+0xf/0x28
> Apr 25 21:16:11 square kernel: [ 360.740413] [<ffffffff812fb783>] ?
> mutex_lock+0x1a/0x31
> Apr 25 21:16:11 square kernel: [ 360.740416] [<ffffffff8110be35>] ?
> sync_filesystems+0x13/0xe3
> Apr 25 21:16:11 square kernel: [ 360.740418] [<ffffffff8110bf4a>] ?
> sys_sync+0x1c/0x2e
> Apr 25 21:16:11 square kernel: [ 360.740420] [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> Apr 25 21:18:11 square kernel: [ 480.740375] stress D
> ffff8800cf609c40 0 5965 5963 0x00000000
> Apr 25 21:18:11 square kernel: [ 480.740378] ffff8800cf609c40
> 0000000000000086 ffffffff810414d5 000000010000000e
> Apr 25 21:18:11 square kernel: [ 480.740381] 0000000000015780
> ffff880100383e68 000000000000f9e0 ffff880100383fd8
> Apr 25 21:18:11 square kernel: [ 480.740383] 0000000000015780
> 0000000000015780 ffff8800cf60f100 ffff8800cf60f3f8
> Apr 25 21:18:11 square kernel: [ 480.740385] Call Trace:
> Apr 25 21:18:11 square kernel: [ 480.740392] [<ffffffff810414d5>] ?
> select_task_rq_fair+0x472/0x836
> Apr 25 21:18:11 square kernel: [ 480.740395] [<ffffffff8101650e>] ?
> native_sched_clock+0x2e/0x66
> Apr 25 21:18:11 square kernel: [ 480.740397] [<ffffffff8103fc8e>] ?
> update_curr+0xa6/0x147
> Apr 25 21:18:11 square kernel: [ 480.740399] [<ffffffff8101654b>] ?
> sched_clock+0x5/0x8
> Apr 25 21:18:11 square kernel: [ 480.740402] [<ffffffff812fb65b>] ?
> __mutex_lock_common+0x122/0x192
> Apr 25 21:18:11 square kernel: [ 480.740404] [<ffffffff812fb783>] ?
> mutex_lock+0x1a/0x31
> Apr 25 21:18:11 square kernel: [ 480.740407] [<ffffffff8110be35>] ?
> sync_filesystems+0x13/0xe3
> Apr 25 21:18:11 square kernel: [ 480.740409] [<ffffffff8110bf40>] ?
> sys_sync+0x12/0x2e
> Apr 25 21:18:11 square kernel: [ 480.740411] [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
>
> My attempts at troubleshooting this have been like so;
>
> 1) Compile kernels and flightgear.
..flightgear, from git?
> Usually fails after 10 mins or so.
> 2) Remove one mem stick, swap with other. Try different slots. It
> fails "less often" with one stick than with both.
> 3) Memtest86+ shows both sticks to be ok.
..memory running too hot? Fan air ducting around your memory
sticks might help.
> 4) Ran "stress". This fails more often if I enable hdd tests but it
> still fails.
> 5) Installed fedora to prove it's not just a Debian thing. Errors are
> the exact same under fedora.
>
> I'm at a loss as to what it could be and would like to determine at
> least something before I start throwing money around. All I have left
> if that some incompatibility between mobo/mem/cpu/disk is causing
> this.
>
> Does anyone have any advice on what tools I can use to narrow it down
> more or eliminate certain components?
--
..med vennlig hilsen = with Kind Regards from Arnt Karlsen
...with a number of polar bear hunters in his ancestry...
Scenarios always come in sets of three:
best case, worst case, and just in case.
Reply to: