[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Help needed fixing kernel errors



On Tue, 26 Apr 2011 14:37:24 +1000, Steven wrote in message 
<[🔎] 1303792644.6192.14.camel@square>:

> Hi folks,
> I have a problem that's now beyond my expertise to fault properly. I
> get random intermittent kernel errors. Usually when the system is
> under stress.
> 
> System specs;
> AMD X4 840 (Badged phenomii but it's really an athlon core)
> ASUS M4A88TD-M EVO/USB3
> 2x 2GB sticks of Corsair 1600 DDR3
> 500TB WD Caviar Blue.
> 
> Below are some example of the errors.
> 
> square kernel: [  683.271626] Pid: 6593, comm: rsync Tainted: P      D
> 2.6.32-5-amd64 #1
> Apr 24 14:51:38 square kernel: [  683.271631] Call Trace:
> Apr 24 14:51:38 square kernel: [  683.271648]  [<ffffffff810cad37>] ?
> print_bad_pte+0x232/0x24a
> Apr 24 14:51:38 square kernel: [  683.271660]  [<ffffffff810cbde7>] ?
> unmap_vmas+0x62d/0x931
> Apr 24 14:51:38 square kernel: [  683.271672]  [<ffffffff8118e194>] ?
> cpumask_any_but+0x28/0x34
> Apr 24 14:51:38 square kernel: [  683.271682]  [<ffffffff810d04c4>] ?
> exit_mmap+0xc4/0x148
> Apr 24 14:51:38 square kernel: [  683.271690]  [<ffffffff8104bc6d>] ?
> mmput+0x3c/0xdf
> Apr 24 14:51:38 square kernel: [  683.271698]  [<ffffffff8104f866>] ?
> exit_mm+0x102/0x10d
> Apr 24 14:51:38 square kernel: [  683.271705]  [<ffffffff8105128b>] ?
> do_exit+0x1f8/0x6c6
> Apr 24 14:51:38 square kernel: [  683.271712]  [<ffffffff810517cf>] ?
> do_group_exit+0x76/0x9d
> Apr 24 14:51:38 square kernel: [  683.271720]  [<ffffffff81051808>] ?
> sys_exit_group+0x12/0x16
> Apr 24 14:51:38 square kernel: [  683.271727]  [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> Apr 24 14:51:44 square kerneloops: Submitted 1 kernel oopses to
> www.kerneloops.org
> 
> Another from minecraft;
> 
> d: 6742, comm: java Tainted: P    B D    2.6.32-5-amd64 #1
> Apr 24 15:12:02 square kernel: [ 1907.726033] Call Trace:
> Apr 24 15:12:02 square kernel: [ 1907.726039]  [<ffffffff810b7a11>] ?
> bad_page+0x116/0x129
> Apr 24 15:12:02 square kernel: [ 1907.726042]  [<ffffffff810b9b2e>] ?
> get_page_from_freelist+0x4fd/0x760
> Apr 24 15:12:02 square kernel: [ 1907.726098]  [<ffffffffa0246f02>] ?
> firegl_trace+0x72/0x1e0 [fglrx]
> Apr 24 15:12:02 square kernel: [ 1907.726100]  [<ffffffff810ba0f8>] ?
> __alloc_pages_nodemask+0x11c/0x5f4
> Apr 24 15:12:02 square kernel: [ 1907.726104]  [<ffffffff81036605>] ?
> native_flush_tlb_others+0xb6/0xe3
> Apr 24 15:12:02 square kernel: [ 1907.726107]  [<ffffffff810bc479>] ?
> ____pagevec_lru_add+0x160/0x176
> Apr 24 15:12:02 square kernel: [ 1907.726110]  [<ffffffff810cc981>] ?
> handle_mm_fault+0x27a/0x80f
> Apr 24 15:12:02 square kernel: [ 1907.726113]  [<ffffffff812fe6b6>] ?
> do_page_fault+0x2e0/0x2fc
> Apr 24 15:12:02 square kernel: [ 1907.726116]  [<ffffffff812fc555>] ?
> page_fault+0x25/0x30
> 
> Another one from stress.
> 
> stress        D 0000000000000000     0  5972   5963 0x00000000
> Apr 25 21:16:11 square kernel: [  360.740389]  ffff88011b04dbd0
> 0000000000000082 ffff880114f40150 000000000000000e
> Apr 25 21:16:11 square kernel: [  360.740392]  0007ffffffffffff
> 0000000000000000 000000000000f9e0 ffff880100329fd8
> Apr 25 21:16:11 square kernel: [  360.740395]  0000000000015780
> 0000000000015780 ffff88011b04f100 ffff88011b04f3f8
> Apr 25 21:16:11 square kernel: [  360.740397] Call Trace:
> Apr 25 21:16:11 square kernel: [  360.740404]  [<ffffffff8104001f>] ?
> check_preempt_wakeup+0x1dd/0x268
> Apr 25 21:16:11 square kernel: [  360.740408]  [<ffffffff812fb65b>] ?
> __mutex_lock_common+0x122/0x192
> Apr 25 21:16:11 square kernel: [  360.740411]  [<ffffffff810493e0>] ?
> update_rq_clock+0xf/0x28
> Apr 25 21:16:11 square kernel: [  360.740413]  [<ffffffff812fb783>] ?
> mutex_lock+0x1a/0x31
> Apr 25 21:16:11 square kernel: [  360.740416]  [<ffffffff8110be35>] ?
> sync_filesystems+0x13/0xe3
> Apr 25 21:16:11 square kernel: [  360.740418]  [<ffffffff8110bf4a>] ?
> sys_sync+0x1c/0x2e
> Apr 25 21:16:11 square kernel: [  360.740420]  [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> Apr 25 21:18:11 square kernel: [  480.740375] stress        D
> ffff8800cf609c40     0  5965   5963 0x00000000
> Apr 25 21:18:11 square kernel: [  480.740378]  ffff8800cf609c40
> 0000000000000086 ffffffff810414d5 000000010000000e
> Apr 25 21:18:11 square kernel: [  480.740381]  0000000000015780
> ffff880100383e68 000000000000f9e0 ffff880100383fd8
> Apr 25 21:18:11 square kernel: [  480.740383]  0000000000015780
> 0000000000015780 ffff8800cf60f100 ffff8800cf60f3f8
> Apr 25 21:18:11 square kernel: [  480.740385] Call Trace:
> Apr 25 21:18:11 square kernel: [  480.740392]  [<ffffffff810414d5>] ?
> select_task_rq_fair+0x472/0x836
> Apr 25 21:18:11 square kernel: [  480.740395]  [<ffffffff8101650e>] ?
> native_sched_clock+0x2e/0x66
> Apr 25 21:18:11 square kernel: [  480.740397]  [<ffffffff8103fc8e>] ?
> update_curr+0xa6/0x147
> Apr 25 21:18:11 square kernel: [  480.740399]  [<ffffffff8101654b>] ?
> sched_clock+0x5/0x8
> Apr 25 21:18:11 square kernel: [  480.740402]  [<ffffffff812fb65b>] ?
> __mutex_lock_common+0x122/0x192
> Apr 25 21:18:11 square kernel: [  480.740404]  [<ffffffff812fb783>] ?
> mutex_lock+0x1a/0x31
> Apr 25 21:18:11 square kernel: [  480.740407]  [<ffffffff8110be35>] ?
> sync_filesystems+0x13/0xe3
> Apr 25 21:18:11 square kernel: [  480.740409]  [<ffffffff8110bf40>] ?
> sys_sync+0x12/0x2e
> Apr 25 21:18:11 square kernel: [  480.740411]  [<ffffffff81010b42>] ?
> system_call_fastpath+0x16/0x1b
> 
> My attempts at troubleshooting this have been like so;
> 
> 1) Compile kernels and flightgear. 

..flightgear, from git?

> Usually fails after 10 mins or so.
> 2) Remove one mem stick, swap with other. Try different slots. It
> fails "less often" with one stick than with both. 
> 3) Memtest86+ shows both sticks to be ok.

..memory running too hot?  Fan air ducting around your memory
sticks might help.

> 4) Ran "stress". This fails more often if I enable hdd tests but it
> still fails.
> 5) Installed fedora to prove it's not just a Debian thing. Errors are
> the exact same under fedora.
> 
> I'm at a loss as to what it could be and would like to determine at
> least something before I start throwing money around. All I have left
> if that some incompatibility between mobo/mem/cpu/disk is causing
> this.
> 
> Does anyone have any advice on what tools I can use to narrow it down
> more or eliminate certain components?


-- 
..med vennlig hilsen = with Kind Regards from Arnt Karlsen
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.


Reply to: