[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

kernel soft lockup (nagios related?)



Hello,

I've got a server which has now crashed a few times in a similar
fashion, even tried moving to new hardware with similar effect (tho on
the new hardware this seems to be happening more frequently), so this
seems likely some interaction between nagios and the kernel causing a
soft lock. Any ideas on how to resolve this would be appreciated.
Unfortunately this is the only log I have of the event, the first
event didn't produce any output like this, and I haven't got a record
of the logs from the previous hardware as I thought they may have been
isolated incidents.

The previous hardware was running Lenny rather than Squeeze, so this
seems not isolated to just one version of anything in particular.

Let me know if there is any more information which would be of use.

There's quite a few bits of software running on here, RTG, Cricket,
Nagios, smokeping, rancid

Debian 6.0.3

Linux zzz-zzz 2.6.32-5-686-bigmem #1 SMP Wed Jan 11 13:17:56 UTC 2012
i686 GNU/Linux

Jan 22 22:40:40 zzz-zzz kernel: [176617.648985] BUG: soft lockup -
CPU#13 stuck for 61s! [nagios3:2070]
Jan 22 22:40:40 zzz-zzz kernel: [176617.649040] Modules linked in:
netconsole configfs joydev usbhid hid xt_multiport iptable_filter
ip_tables x_tables 8021q garp stp loop snd_pcm snd_timer snd soundcore
snd_page_alloc ioatdma pcspkr evdev cdc_ether usbnet button processor
serio_raw dca mii shpchp pci_hotplug i2c_i801 i2c_core ext4 mbcache
jbd2 crc16 raid10 md_mod sd_mod crc_t10dif ata_generic uhci_hcd
megaraid_sas ata_piix ehci_hcd libata usbcore scsi_mod nls_base
thermal bnx2 thermal_sys [last unloaded: netconsole]
Jan 22 22:40:40 zzz-zzz kernel: [176617.649078]
Jan 22 22:40:40 zzz-zzz kernel: [176617.649082] Pid: 2070, comm:
nagios3 Not tainted (2.6.32-5-686-bigmem #1) System x3550 M3
-[7944D2M]-
Jan 22 22:40:40 zzz-zzz kernel: [176617.649085] EIP: 0060:[<c10249bb>]
EFLAGS: 00000202 CPU: 13
Jan 22 22:40:40 zzz-zzz kernel: [176617.649094] EIP is at
native_flush_tlb_others+0x85/0xa6
Jan 22 22:40:40 zzz-zzz kernel: [176617.649096] EAX: 00000282 EBX:
c14661ac ECX: c10200d8 EDX: 00000020
Jan 22 22:40:40 zzz-zzz kernel: [176617.649099] ESI: 00000005 EDI:
00000140 EBP: c14661a0 ESP: ee4c9a3c
Jan 22 22:40:40 zzz-zzz kernel: [176617.649101]  DS: 007b ES: 007b FS:
00d8 GS: 00e0 SS: 0068
Jan 22 22:40:40 zzz-zzz kernel: [176617.649104] CR0: 8005003b CR2:
b758a376 CR3: 2eb7e000 CR4: 000006f0
Jan 22 22:40:40 zzz-zzz kernel: [176617.649106] DR0: 00000000 DR1:
00000000 DR2: 00000000 DR3: 00000000
Jan 22 22:40:40 zzz-zzz kernel: [176617.649108] DR6: ffff0ff0 DR7: 00000400
Jan 22 22:40:40 zzz-zzz kernel: [176617.649110] Call Trace:
Jan 22 22:40:40 zzz-zzz kernel: [176617.649116]  [<c1024aa3>] ?
flush_tlb_page+0x5d/0x65
Jan 22 22:40:40 zzz-zzz kernel: [176617.649120]  [<c1023e90>] ?
ptep_set_access_flags+0x59/0x63
Jan 22 22:40:40 zzz-zzz kernel: [176617.649125]  [<c10a1040>] ?
do_wp_page+0x3b9/0x7dd
Jan 22 22:40:40 zzz-zzz kernel: [176617.649131]  [<c1031770>] ?
finish_task_switch+0x76/0x95
Jan 22 22:40:40 zzz-zzz kernel: [176617.649135]  [<c10b61a0>] ?
kmem_cache_free+0x78/0xaf
Jan 22 22:40:40 zzz-zzz kernel: [176617.649138]  [<c1031770>] ?
finish_task_switch+0x76/0x95
Jan 22 22:40:40 zzz-zzz kernel: [1766Jan 23 07:13:24 zzz-zzz
syslog-ng[1807]: syslog-ng starting up; version='3.1.3'

Cheers,
Blair


Reply to: