[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#597489: Kswapd hanging: patch available from lkml



Hi, posting here since I am evidently reproducing this bug.

Under load (relatively mild anyway) a 24-core X5660, 24GB RAM Dell
Poweredge 710 gets stuck with 100% cpu usage (meaning one core gets
stuck in running kswapd). The peculiarity of the situation is that NO
swap is being allocated, si and so columns in vmstat output show no swap
usage, and swap was correctly mounted. Also, it was not running
completely out of RAM.

The machine eventually freezed so I was not able to get any information
apart from the kernel stack trace, which I post at the end of the
report. 

This issue seems to be a known bug in the linux kernel, and as far as I
understand a patch is available (and already included in RH kernels):

 http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977

I'll try to reproduce the problem, in the meantime do you think the
solution Mel proposed could be ported back to the stable kernel?

Kernel stack trace (excerpt) is attached.

Best,
Giuseppe
-- 
Giuseppe Lavagetto, Ph.d.
Systems Manager and Developer - Gruppo Immobiliare.it s.r.l.
[86613.384580] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc drbd ixgbe dca lru_cache cn ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu bonding ext3 jbd mbcache loop snd_pcm dcdbas joydev power_meter processor button psmouse snd_timer serio_raw snd soundcore snd_page_alloc evdev pcspkr xfs exportfs sg sr_mod cdrom ata_generic usbhid hid uhci_hcd sd_mod ses crc_t10dif enclosure thermal ehci_hcd ata_piix usbcore libata megaraid_sas nls_base scsi_mod bnx2 thermal_sys [last unloaded: drbd]
[86613.384610] CPU 2:
[86613.384611] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc drbd ixgbe dca lru_cache cn ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu bonding ext3 jbd mbcache loop snd_pcm dcdbas joydev power_meter processor button psmouse snd_timer serio_raw snd soundcore snd_page_alloc evdev pcspkr xfs exportfs sg sr_mod cdrom ata_generic usbhid hid uhci_hcd sd_mod ses crc_t10dif enclosure thermal ehci_hcd ata_piix usbcore libata megaraid_sas nls_base scsi_mod bnx2 thermal_sys [last unloaded: drbd]
[86613.384635] Pid: 207, comm: kswapd0 Not tainted 2.6.32-5-amd64 #1 PowerEdge R710
[86613.384636] RIP: 0010:[<ffffffff810b3f19>]  [<ffffffff810b3f19>] find_get_pages+0x5f/0xbb
[86613.384645] RSP: 0018:ffff88062c869bc0  EFLAGS: 00000293
[86613.384646] RAX: ffffffffffffffff RBX: ffff88062c869c50 RCX: 0000000000000000
[86613.384648] RDX: 0000000000000040 RSI: ffffea0002bc56e0 RDI: ffffea0002bc56d8
[86613.384649] RBP: ffffffff8101166e R08: ffff88062c869b80 R09: 0000000000000002
[86613.384651] R10: 0000000000000040 R11: ffff880093d74ad8 R12: 0000000000000005
[86613.384653] R13: 0000000000000286 R14: ffff88000000b100 R15: ffff88000000c780
[86613.384655] FS:  0000000000000000(0000) GS:ffff88033ac20000(0000) knlGS:0000000000000000
[86613.384656] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[86613.384658] CR2: 00007fffd81dd038 CR3: 0000000001001000 CR4: 00000000000006e0
[86613.384659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[86613.384661] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[86613.384663] Call Trace:
[86613.384668]  [<ffffffff810bc034>] ? pagevec_lookup+0x17/0x1e
[86613.384671]  [<ffffffff810bcdf1>] ? invalidate_mapping_pages+0xb9/0xdb
[86613.384675]  [<ffffffff81100573>] ? shrink_icache_memory+0xfc/0x228
[86613.384678]  [<ffffffff810bf3f5>] ? shrink_slab+0xe0/0x153
[86613.384680]  [<ffffffff810bfc98>] ? kswapd+0x4d9/0x686
[86613.384683]  [<ffffffff810bd30f>] ? isolate_pages_global+0x0/0x20f
[86613.384687]  [<ffffffff81064e96>] ? autoremove_wake_function+0x0/0x2e
[86613.384691]  [<ffffffff8103aa56>] ? __wake_up_common+0x44/0x72
[86613.384693]  [<ffffffff810bf7bf>] ? kswapd+0x0/0x686
[86613.384695]  [<ffffffff81064bc9>] ? kthread+0x79/0x81
[86613.384700]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[86613.384702]  [<ffffffff81064b50>] ? kthread+0x0/0x81
[86613.384703]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20

Reply to: