[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#823434: Crash in wheezy-backports kernels on skb_release_data



Package: linux-image-3.16.0-0.bpo.4-amd64
Version: multiple

We're seeing a crash once or twice a week across multiple machines running various versions of the wheezy-backports 3.16 kernel. Here is an excerpt from one of the dumps we captured. Unfortunately, the top was cut off due to the screen size, but the most important stuff appears to be intact:

task: ffff880199212b60 ti: ffff880199220000 task.ti: ffff880199220000
RIP: 0010:[<ffffffff8140b5a3>]  [<ffffffff8140b5a3>] skb_release_data+0xe3/0x110
RSP: 0018:ffff88031fae3c20  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff8802b1c448c0 RCX: 00000000b1c44800
RDX: 0000000000000060 RSI: 00000000fffffe01 RDI: 0101010101010101
RBP: 0000000000000001 R08: ffff880318d3d798 R09: 0000000000000002
R10: ffff8802d5f75e00 R11: 000000000000001b R12: ffff8802d5f75e00
R13: 000000000000003b R14: ffff8802a69fca10 R15: ffff8802ef350100
FS:  0000000000000000(0000) GS:ffff88031fae0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 0000000001813000 CR4: 00000000000007e0
Stack:
  0000000000004f00 ffff8802d5f75e00 0000000000000000 ffffffff8140b817
  0000000000004f00 ffff8802d5f75e00 ffffffff81456058 0000001400000000
  0000004f00000010 0000058eb1c44800 ffff8802fc4a0000 0000000000000010
Call Trace:
 <IRQ>
 [<ffffffff8140b817>] ? consume_skb+0x27/0x80
 [<ffffffff81456058>] ? ip_fragment+0x5b8/0x880
 [<ffffffff81455690>] ? skb_set_owner_w+0x50/0x50
 [<ffffffff8145685c>] ? ip_finish_output+0x53c/0x840
 [<ffffffff8141b563>] ? __netif_receive_skb_core+0x533/0x750
 [<ffffffff8101b465>] ? read_tsc+0x5/0x20
 [<ffffffff8141b7ff>] ? netif_receive_skb_internal+0x1f/0x90
 [<ffffffff8141be6f>] ? dev_gro_receive+0x1df/0x2e0
 [<ffffffff8141c257>] ? napi_gro_receive+0x27/0xe0
 [<ffffffffa0051424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
 [<ffffffffa005189d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
 [<ffffffff8141bb90>] ? net_rx_action+0x140/0x240
 [<ffffffff8106c591>] ? __do_softirq+0xf1/0x290
 [<ffffffff8106c965>] ? irq_exit+0x95/0xa0
 [<ffffffff81512542>] ? do_IRQ+0x52/0xe0
 [<ffffffff815103ed>] ? common_interrupt+0x6d/0x6d
 <EOI>
 [<ffffffff813dc77f>] ? cpuidle_enter_state+0x4f/0xc0
 [<ffffffff813dc778>] ? cpuidle_enter_state+0x48/0xc0
 [<ffffffff810a7d78>] ? cpu_startup_entry+0x2f8/0x400
 [<ffffffff81042bef>] ? start_secondary+0x20f/0x2d0
Code: 8b 9c 24 cc 00 00 00 49 03 9c 24 d0 00 00 00 48 8b 7b 08 48 85 ff 75 13 5b 5d 4c 89 e7 41 5c e9 a4 fe ff ff 0f 1f 40 00 48 89 ef <48> 8b 2f e8 75 00 00 00 48 85 ed 75 f0 48 c7 43 08 00 00 00 00
RIP  [<ffffffff8140b5a3>] skb_release_data_0xe3/0x110
 RSP <ffff88031fae3c20>
----[ end trace 35ec23bf75ec9349 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Some sleuthing led me to https://lkml.org/lkml/2016/1/6/627 which doesn't look exactly the same, but may be related, particularly since we are routing between networks with a variety of MTU values, and fragmentation is common. The fix for that is at https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=9207f9d45b0ad071baa128e846d7e7ed85016df3

--
James Oakley
james.oakley@multapplied.net


Reply to: