[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#899044: Oops: 0000 [#1] SMP in skb_release_data, openvswitch related



Package: src:linux
Version: 4.9.88-1

Hi,

I'm observing the attached errors on machines that are Xen dom0 and
running the latest Debian Stretch 4.9 kernel as dom0 kernel. The errors
have been happening a few times in the last few weeks. It started after
upgrading them from Jessie and 3.16 kernel to Stretch with 4.9 kernel.

For networking between domUs and the outside world, we use openvswitch.

After such an error happens:
* The amount of "flows" in the kernel quickly raises to the limit,
10000, as seen in output of ovs-dpctl show.
* Network traffic that should flow through the openvswitch bridge starts
disappearing in a seemingly random way.
* The memory usage of the userspace ovs-vswitchd starts growing quickly.
* Many of the ovs commands, like to add or remove an interface or bridge
hang.

After a restart of the openvswitch-switch service, and fixing up a bunch
of configuration of connected interfaces, functionality is restored.

While most of the symptoms seem related to userspace openvswitch
processes, the cause of it all seems to be in the kernel, while the
userspace ovs-vswitchd process is receiving a network packet?

Sadly I do not know how to reproduce this, except for just waiting until
it happens again.

Please advice what else I could use to help resolving this issue.

Thanks,
Regards,
-- 
Hans van Kranenburg
May  4 08:23:03 altair kernel: [83978.662075] BUG: unable to handle kernel paging request at 000000030000001f
May  4 08:23:03 altair kernel: [83978.665887] IP: [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110
May  4 08:23:03 altair kernel: [83978.669837] PGD 0 
May  4 08:23:03 altair kernel: [83978.669882] 
May  4 08:23:03 altair kernel: [83978.673589] Oops: 0000 [#1] SMP
May  4 08:23:03 altair kernel: [83978.677281] Modules linked in: cls_u32 sch_ingress act_mirred sch_fq_codel ifb xt_mark sch_htb xt_physdev br_netfilter bridge stp llc xen_netback xen_blkback algif_skcipher af_alg dm_service_time binfmt_misc xen_gntdev xen_evtchn openvswitch nf_nat_ipv6 libcrc32c xenfs xen_privcmd ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw dm_crypt intel_powerclamp crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcspkr serio_raw joydev evdev amdkfd radeon ttm drm_kms_helper drm i2c_algo_bit lpc_ich mfd_core i7core_edac hpilo
May  4 08:23:03 altair kernel: [83978.701936]  sg ipmi_si hpwdt edac_core ipmi_msghandler acpi_power_meter button shpchp dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq mlx4_en ptp pps_core hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ehci_pci uhci_hcd ehci_hcd usbcore usb_common hpsa scsi_transport_sas bnx2 mlx4_core devlink scsi_mod thermal
May  4 08:23:03 altair kernel: [83978.724406] CPU: 1 PID: 1486 Comm: revalidator7 Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1
May  4 08:23:03 altair kernel: [83978.729139] Hardware name: HP ProLiant DL360 G7, BIOS P68 08/16/2015
May  4 08:23:03 altair kernel: [83978.733958] task: ffff880119e1ee80 task.stack: ffffc90042764000
May  4 08:23:03 altair kernel: [83978.738724] RIP: e030:[<ffffffff814f5c7d>]  [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110
May  4 08:23:03 altair kernel: [83978.743560] RSP: e02b:ffffc90042767c78  EFLAGS: 00010206
May  4 08:23:03 altair kernel: [83978.748352] RAX: 0000000000000050 RBX: 00000002ffffffff RCX: ffffffff81ce0f40
May  4 08:23:03 altair kernel: [83978.753116] RDX: ffffffffffffffff RSI: ffff8800cc998900 RDI: ffff8800cc998900
May  4 08:23:03 altair kernel: [83978.757867] RBP: ffff8800cc998900 R08: ffff880123c00000 R09: ffff88011f220000
May  4 08:23:03 altair kernel: [83978.762598] R10: ffff8800cc998900 R11: ffff880119e10280 R12: 0000000000000002
May  4 08:23:03 altair kernel: [83978.767321] R13: ffff88011f227ec0 R14: ffff88011dea2800 R15: 0000000000000000
May  4 08:23:03 altair kernel: [83978.772000] FS:  00007fc1656cc700(0000) GS:ffff880128240000(0000) knlGS:0000000000000000
May  4 08:23:03 altair kernel: [83978.776671] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May  4 08:23:03 altair kernel: [83978.781355] CR2: 000000030000001f CR3: 00000001212b1000 CR4: 0000000000002660
May  4 08:23:03 altair kernel: [83978.786135] Stack:
May  4 08:23:03 altair kernel: [83978.790841]  ffff880120a28000 ffff8800cc998900 ffffc90042767ec0 0000000000007ea4
May  4 08:23:03 altair kernel: [83978.795898]  ffffffff814f6267 ffff880120a28000 ffff8800cc998900 ffffffff814fcc91
May  4 08:23:03 altair kernel: [83978.800806]  ffff880120a28000 ffffffff8153f2df ffffc90000000000 ffff8800cc998900
May  4 08:23:03 altair kernel: [83978.805723] Call Trace:
May  4 08:23:03 altair kernel: [83978.810654]  [<ffffffff814f6267>] ? consume_skb+0x27/0x80
May  4 08:23:03 altair kernel: [83978.815626]  [<ffffffff814fcc91>] ? skb_free_datagram+0x11/0x40
May  4 08:23:03 altair kernel: [83978.820545]  [<ffffffff8153f2df>] ? netlink_recvmsg+0x19f/0x440
May  4 08:23:03 altair kernel: [83978.825426]  [<ffffffff814ed4ca>] ? ___sys_recvmsg+0xda/0x1f0
May  4 08:23:03 altair kernel: [83978.830273]  [<ffffffff812237fb>] ? file_update_time+0xcb/0x110
May  4 08:23:03 altair kernel: [83978.835058]  [<ffffffff8120fbeb>] ? pipe_write+0x29b/0x3e0
May  4 08:23:03 altair kernel: [83978.839800]  [<ffffffff812066b0>] ? new_sync_write+0xe0/0x130
May  4 08:23:03 altair kernel: [83978.844502]  [<ffffffff814edf4e>] ? __sys_recvmsg+0x4e/0x90
May  4 08:23:03 altair kernel: [83978.849161]  [<ffffffff81003b7d>] ? do_syscall_64+0x8d/0xf0
May  4 08:23:03 altair kernel: [83978.853779]  [<ffffffff8161244e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
May  4 08:23:03 altair kernel: [83978.858397] Code: 03 48 c1 e8 37 83 e0 07 83 f8 04 74 49 41 0f b6 45 00 41 83 c4 01 44 39 e0 7e 51 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 5c 05 00 <48> 8b 43 20 48 8d 50 ff a8 01 48 0f 45 da f0 ff 4b 1c 75 bf 48 
May  4 08:23:03 altair kernel: [83978.868227] RIP  [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110
May  4 08:23:03 altair kernel: [83978.873017]  RSP <ffffc90042767c78>
May  4 08:23:03 altair kernel: [83978.877746


May  4 22:00:22 sirius kernel: [1999361.378086] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0
May  4 22:00:22 sirius kernel: [1999361.381804] IP: [<ffffffff814f4c7d>] skb_release_data+0x8d/0x110
May  4 22:00:22 sirius kernel: [1999361.385492] PGD 0 
May  4 22:00:22 sirius kernel: [1999361.385535] 
May  4 22:00:22 sirius kernel: [1999361.389145] Oops: 0000 [#1] SMP
May  4 22:00:22 sirius kernel: [1999361.392725] Modules linked in: xt_physdev br_netfilter bridge stp llc xen_netback xen_blkback algif_skcipher af_alg dm_service_time binfmt_misc openvswitch nf_nat_ipv6 libcrc32c xen_gntdev xen_evtchn xenfs xen_privcmd ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw dm_crypt intel_powerclamp crct10dif_pclmul crc32_pclmul amdkfd iTCO_wdt evdev joydev iTCO_vendor_support ghash_clmulni_intel radeon ttm serio_raw pcspkr drm_kms_helper drm i2c_algo_bit sg i7core_edac lpc_ich ipmi_si acpi_power_meter hpilo hpwdt mfd_core edac_core ipmi_msghandler button
May  4 22:00:22 sirius kernel: [1999361.416634]  shpchp dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq mlx4_en ptp pps_core hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ehci_pci uhci_hcd ehci_hcd usbcore usb_common mlx4_core hpsa scsi_transport_sas bnx2 devlink scsi_mod thermal
May  4 22:00:22 sirius kernel: [1999361.438322] CPU: 2 PID: 1400 Comm: revalidator9 Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
May  4 22:00:22 sirius kernel: [1999361.442773] Hardware name: HP ProLiant DL360 G7, BIOS P68 08/16/2015
May  4 22:00:22 sirius kernel: [1999361.447219] task: ffff880111c58540 task.stack: ffffc90041bcc000
May  4 22:00:22 sirius kernel: [1999361.451796] RIP: e030:[<ffffffff814f4c7d>]  [<ffffffff814f4c7d>] skb_release_data+0x8d/0x110
May  4 22:00:22 sirius kernel: [1999361.456294] RSP: e02b:ffffc90041bcfc78  EFLAGS: 00010206
May  4 22:00:22 sirius kernel: [1999361.460758] RAX: 0000000000000030 RBX: 00000000000001c0 RCX: ffffffff81ce0e00
May  4 22:00:22 sirius kernel: [1999361.465261] RDX: 0000000000008100 RSI: ffff880118a94f00 RDI: ffff880118a94f00
May  4 22:00:22 sirius kernel: [1999361.469724] RBP: ffff880118a94f00 R08: ffff88011bc00000 R09: ffff8800b0218000
May  4 22:00:22 sirius kernel: [1999361.474230] R10: ffff880118a94f00 R11: ffff880111c50240 R12: 0000000000000000
May  4 22:00:22 sirius kernel: [1999361.478710] R13: ffff8800b021fec0 R14: ffff8800b8356a40 R15: 0000000000000000
May  4 22:00:22 sirius kernel: [1999361.483220] FS:  00007faa54946700(0000) GS:ffff880120280000(0000) knlGS:0000000000000000
May  4 22:00:22 sirius kernel: [1999361.487736] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May  4 22:00:22 sirius kernel: [1999361.492181] CR2: 00000000000001e0 CR3: 0000000119ce9000 CR4: 0000000000002660
May  4 22:00:22 sirius kernel: [1999361.496661] Stack:
May  4 22:00:22 sirius kernel: [1999361.501036]  ffff8801190a2800 ffff880118a94f00 ffffc90041bcfec0 0000000000007eac
May  4 22:00:22 sirius kernel: [1999361.505470]  ffffffff814f5267 ffff8801190a2800 ffff880118a94f00 ffffffff814fbc91
May  4 22:00:22 sirius kernel: [1999361.509844]  ffff8801190a2800 ffffffff8153e2bf ffffc90000000000 ffff880118a94f00
May  4 22:00:22 sirius kernel: [1999361.514213] Call Trace:
May  4 22:00:22 sirius kernel: [1999361.518499]  [<ffffffff814f5267>] ? consume_skb+0x27/0x80
May  4 22:00:22 sirius kernel: [1999361.522818]  [<ffffffff814fbc91>] ? skb_free_datagram+0x11/0x40
May  4 22:00:22 sirius kernel: [1999361.527109]  [<ffffffff8153e2bf>] ? netlink_recvmsg+0x19f/0x440
May  4 22:00:22 sirius kernel: [1999361.531314]  [<ffffffff814ec4ca>] ? ___sys_recvmsg+0xda/0x1f0
May  4 22:00:22 sirius kernel: [1999361.535488]  [<ffffffff812221ab>] ? file_update_time+0xcb/0x110
May  4 22:00:22 sirius kernel: [1999361.539626]  [<ffffffff8120e5cb>] ? pipe_write+0x29b/0x3e0
May  4 22:00:22 sirius kernel: [1999361.543790]  [<ffffffff812050a0>] ? new_sync_write+0xe0/0x130
May  4 22:00:22 sirius kernel: [1999361.547989]  [<ffffffff814ecf4e>] ? __sys_recvmsg+0x4e/0x90
May  4 22:00:22 sirius kernel: [1999361.552218]  [<ffffffff81003b7f>] ? do_syscall_64+0x8f/0xf0
May  4 22:00:22 sirius kernel: [1999361.556467]  [<ffffffff816113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
May  4 22:00:22 sirius kernel: [1999361.560791] Code: 03 48 c1 e8 37 83 e0 07 83 f8 04 74 49 41 0f b6 45 00 41 83 c4 01 44 39 e0 7e 51 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 5c 05 00 <48> 8b 43 20 48 8d 50 ff a8 01 48 0f 45 da f0 ff 4b 1c 75 bf 48 
May  4 22:00:22 sirius kernel: [1999361.570202] RIP  [<ffffffff814f4c7d>] skb_release_data+0x8d/0x110
May  4 22:00:22 sirius kernel: [1999361.575033]  RSP <ffffc90041bcfc78>
May  4 22:00:22 sirius kernel: [1999361.579731] CR2: 00000000000001e0
May  4 22:00:22 sirius kernel: [1999361.599233] ---[ end trace de6345fc470c5362 ]---



May 18 13:49:26 omega kernel: [1213243.942643] general protection fault: 0000 [#1] SMP
May 18 13:49:26 omega kernel: [1213243.946704] Modules linked in: xt_physdev br_netfilter bridge stp llc xen_netback xen_blkback algif_skcipher af_alg dm_service_time xen_gntdev openvswitch xen_evtchn nf_nat_ipv6 libcrc32c xenfs xen_privcmd ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw dm_crypt amdkfd radeon intel_powerclamp crct10dif_pclmul iTCO_wdt crc32_pclmul iTCO_vendor_support ttm ghash_clmulni_intel hpwdt pcspkr drm_kms_helper drm serio_raw evdev i2c_algo_bit joydev sg hpilo lpc_ich mfd_core i7core_edac ipmi_si edac_core ipmi_msghandler acpi_power_meter shpchp button dm_multipath
May 18 13:49:26 omega kernel: [1213243.973478]  dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq mlx4_en ptp pps_core hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse uhci_hcd ehci_pci ehci_hcd usbcore usb_common hpsa bnx2 mlx4_core scsi_transport_sas devlink scsi_mod thermal
May 18 13:49:26 omega kernel: [1213243.997290] CPU: 2 PID: 1582 Comm: revalidator9 Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1
May 18 13:49:26 omega kernel: [1213244.002200] Hardware name: HP ProLiant DL360 G7, BIOS P68 08/16/2015
May 18 13:49:26 omega kernel: [1213244.007157] task: ffff8801186caf00 task.stack: ffffc90041b8c000
May 18 13:49:26 omega kernel: [1213244.012040] RIP: e030:[<ffffffff814f5c7d>]  [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110
May 18 13:49:26 omega kernel: [1213244.016957] RSP: e02b:ffffc90041b8fc78  EFLAGS: 00010206
May 18 13:49:26 omega kernel: [1213244.021783] RAX: 0000000000000030 RBX: 290008a753b675a9 RCX: ffffffff81ce0f40
May 18 13:49:26 omega kernel: [1213244.026673] RDX: 0000000000008100 RSI: ffff88011a2c6200 RDI: ffff88011a2c6200
May 18 13:49:26 omega kernel: [1213244.031596] RBP: ffff88011a2c6200 R08: ffff88011bc00000 R09: ffff88011aa70000
May 18 13:49:26 omega kernel: [1213244.036422] R10: ffff88011a2c6200 R11: ffff8801186c0200 R12: 0000000000000000
May 18 13:49:26 omega kernel: [1213244.041267] R13: ffff88011aa77ec0 R14: ffff8801199da7c0 R15: 0000000000000000
May 18 13:49:26 omega kernel: [1213244.046055] FS:  00007fe5f35e2700(0000) GS:ffff880120280000(0000) knlGS:0000000000000000
May 18 13:49:26 omega kernel: [1213244.050785] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May 18 13:49:26 omega kernel: [1213244.055488] CR2: 00007fe579f4f059 CR3: 0000000117428000 CR4: 0000000000002660
May 18 13:49:26 omega kernel: [1213244.060232] Stack:
May 18 13:49:26 omega kernel: [1213244.064896]  ffff880117588800 ffff88011a2c6200 ffffc90041b8fec0 0000000000007e94
May 18 13:49:26 omega kernel: [1213244.069725]  ffffffff814f6267 ffff880117588800 ffff88011a2c6200 ffffffff814fcc91
May 18 13:49:26 omega kernel: [1213244.074552]  ffff880117588800 ffffffff8153f2df ffffc90000000000 ffff88011a2c6200
May 18 13:49:26 omega kernel: [1213244.079377] Call Trace:
May 18 13:49:26 omega kernel: [1213244.084123]  [<ffffffff814f6267>] ? consume_skb+0x27/0x80
May 18 13:49:26 omega kernel: [1213244.089047]  [<ffffffff814fcc91>] ? skb_free_datagram+0x11/0x40
May 18 13:49:26 omega kernel: [1213244.093728]  [<ffffffff8153f2df>] ? netlink_recvmsg+0x19f/0x440
May 18 13:49:26 omega kernel: [1213244.098359]  [<ffffffff814ed4ca>] ? ___sys_recvmsg+0xda/0x1f0
May 18 13:49:26 omega kernel: [1213244.102962]  [<ffffffff812237fb>] ? file_update_time+0xcb/0x110
May 18 13:49:26 omega kernel: [1213244.107530]  [<ffffffff8120fbeb>] ? pipe_write+0x29b/0x3e0
May 18 13:49:26 omega kernel: [1213244.112074]  [<ffffffff812066b0>] ? new_sync_write+0xe0/0x130
May 18 13:49:26 omega kernel: [1213244.116625]  [<ffffffff814edf4e>] ? __sys_recvmsg+0x4e/0x90
May 18 13:49:26 omega kernel: [1213244.121183]  [<ffffffff81003b7d>] ? do_syscall_64+0x8d/0xf0
May 18 13:49:26 omega kernel: [1213244.125715]  [<ffffffff8161244e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
May 18 13:49:26 omega kernel: [1213244.130196] Code: 03 48 c1 e8 37 83 e0 07 83 f8 04 74 49 41 0f b6 45 00 41 83 c4 01 44 39 e0 7e 51 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 5c 05 00 <48> 8b 43 20 48 8d 50 ff a8 01 48 0f 45 da f0 ff 4b 1c 75 bf 48 
May 18 13:49:26 omega kernel: [1213244.139830] RIP  [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110
May 18 13:49:26 omega kernel: [1213244.144491]  RSP <ffffc90041b8fc78>
May 18 13:49:26 omega kernel: [1213244.164037] ---[ end trace c53e06696e145c33 ]---

Reply to: