[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#767261: [Pkg-xen-devel] Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down



On 11/08/2014 05:39 AM, Ian Campbell wrote:
On Sat, 2014-11-08 at 00:40 -0500, Gedalya wrote:
On 11/07/2014 03:25 AM, Ian Campbell wrote:
On Thu, 2014-11-06 at 11:06 -0500, Gedalya wrote:
I suspect we will need to backport some xen-netback patch or other. I've
put some feelers out to see if any of the upstream devs have any
hints...
OK so if it's just a matter of changing a kernel on one box, I can
perhaps try to build a 3.18 this weekend
I think these commits, which are in v3.18-rc3, are probably the ones:

ecf08d2 xen-netback: reintroduce guest Rx stall detection
f48da8b xen-netback: fix unlimited guest Rx internal queue and carrier flapping
bc96f64 xen-netback: make feature-rx-notify mandatory

I'll investigate a backport/check if they are destined for stable@.

Ian.

Tried to just frankenport xen-netback from 3.18 into 3.16, didn't work
very well ;-)
Did you backport just the above or the full set of changes from 3.18?
I tried to "simplify" (avoid having to edit code myself..) by just copying the full xen-netback from 3.18 as it is. I did have to revert "c835a6 net: set name_assign_type in alloc_netdev()" to get it to compile, but then it gave me a kernel bug as soon as a xen guest booted up.
(see attached if it matters)
I'll try to apply just those 3 patches and see how it goes.



I'm running 3.18rc3+ now. Bombarding the downed interface by
broadcast-pinging the network it's on causes the following
[  281.396014] vif vif-3-0 vif3.0: Guest Rx stalled
[  281.396080] breth1: port 3(vif3.0) entered disabled state
and that's it. This is instead of the previously repeated 'draining TX
queue' messages.
Let's assume it won't crash, I'll let you know if this assumption turns
out to be wrong.

I'm kind of curious why this is preceded by
[   46.232475] vif vif-3-0 vif3.0: Guest Rx ready
[   46.232514] IPv6: ADDRCONF(NETDEV_CHANGE): vif3.0: link becomes ready
And the host figures out it's down only when traffic comes and doesn't
get through.
I guess this might change if I run 3.18 in the guest too?
I *think* this is the intended behaviour of "xen-netback: reintroduce
guest Rx stall detection", since the interface is down on the guest side
it becomes considered stalled (i.e not processing any packets).

The "link becomes ready" message I think refers to the backend end of
the connection, it's like a network cable only plugged in at one end or
something. Perhaps things could be smarter, but that would be an
upstream thing I think.
OK, makes sense. Thanks!

Nov  7 23:30:59 xen kernel: [   31.990845] BUG: unable to handle kernel NULL pointer dereference at           (null)
Nov  7 23:30:59 xen kernel: [   31.990862] IP: [<ffffffff812ae67c>] strcmp+0xc/0x30
Nov  7 23:30:59 xen kernel: [   31.990871] PGD 0 
Nov  7 23:30:59 xen kernel: [   31.990876] Oops: 0000 [#1] SMP 
Nov  7 23:30:59 xen kernel: [   31.990882] Modules linked in: xen_netback(+) xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc it87 hwmon_vid snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support coretemp pps_ldisc pps_core i2c_i801 snd_hda_codec_realtek lpc_ich snd_hda_codec_generic mfd_core nouveau ppdev evdev snd_hda_intel mxm_wmi pcspkr snd_hda_controller tpm_infineon tpm_tis i7core_edac edac_core snd_hda_codec serio_raw video tpm snd_hwdep ttm drm_kms_helper drm i2c_algo_bit i2c_core snd_pcm snd_timer snd parport_pc parport wmi soundcore shpchp button processor thermal_sys ext4 crc16 mbcache jbd2 dm_mod ata_generic sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common crc32c_intel firewire_ohci firewire_core crc_itu_t r8169 mii ahci pata_jmicron libahci ehci_pci uhci_hcd xhci_hcd libata ehci_hcd megaraid_sas usbcore usb_common scsi_mod
Nov  7 23:30:59 xen kernel: [   31.992670] CPU: 0 PID: 2470 Comm: udevd Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-2
Nov  7 23:30:59 xen kernel: [   31.993478] Hardware name: Gigabyte Technology Co., Ltd. P55A-UD4P/P55A-UD4P, BIOS F13 08/10/2010
Nov  7 23:30:59 xen kernel: [   31.994297] task: ffff8800028ff530 ti: ffff88001d6ec000 task.ti: ffff88001d6ec000
Nov  7 23:30:59 xen kernel: [   31.995114] RIP: e030:[<ffffffff812ae67c>]  [<ffffffff812ae67c>] strcmp+0xc/0x30
Nov  7 23:30:59 xen kernel: [   31.995933] RSP: e02b:ffff88001d6efcf0  EFLAGS: 00010202
Nov  7 23:30:59 xen kernel: [   31.996744] RAX: 0000000000000076 RBX: ffff88001f622b80 RCX: 0000000000000002
Nov  7 23:30:59 xen kernel: [   31.997556] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff88001fdb3541
Nov  7 23:30:59 xen kernel: [   31.998373] RBP: ffff88002097a5c0 R08: 0000000000000004 R09: 0000000000000008
Nov  7 23:30:59 xen kernel: [   31.999189] R10: ffffffff818e1880 R11: 0000000000003ecd R12: 0000000000000000
Nov  7 23:30:59 xen kernel: [   31.999994] R13: ffffffffa0676000 R14: ffffffffa0673390 R15: 0000000000000001
Nov  7 23:30:59 xen kernel: [   32.000803] FS:  00007f6db3a88880(0000) GS:ffff88002ec00000(0000) knlGS:0000000000000000
Nov  7 23:30:59 xen kernel: [   32.001607] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov  7 23:30:59 xen kernel: [   32.002406] CR2: 0000000000000000 CR3: 000000001f9fa000 CR4: 0000000000002660
Nov  7 23:30:59 xen kernel: [   32.003203] Stack:
Nov  7 23:30:59 xen kernel: [   32.003985]  ffffffff812a9c70 ffffffffa0673058 ffff88001e89d660 0000000000000000
Nov  7 23:30:59 xen kernel: [   32.004774]  ffffffff8139fb8f ffffffffa0673058 ffffffff8139fc0e ffffffff8181a020
Nov  7 23:30:59 xen kernel: [   32.005561]  ffff88001e89d660 ffffffffa0676059 ffffffff8181a020 ffffffff8100213c
Nov  7 23:30:59 xen kernel: [   32.006342] Call Trace:
Nov  7 23:30:59 xen kernel: [   32.007109]  [<ffffffff812a9c70>] ? kset_find_obj+0x30/0x80
Nov  7 23:30:59 xen kernel: [   32.007888]  [<ffffffff8139fb8f>] ? driver_find+0x1f/0x50
Nov  7 23:30:59 xen kernel: [   32.008661]  [<ffffffff8139fc0e>] ? driver_register+0x4e/0xe0
Nov  7 23:30:59 xen kernel: [   32.009429]  [<ffffffffa0676059>] ? netback_init+0x59/0x1000 [xen_netback]
Nov  7 23:30:59 xen kernel: [   32.010191]  [<ffffffff8100213c>] ? do_one_initcall+0xcc/0x200
Nov  7 23:30:59 xen kernel: [   32.010941]  [<ffffffff8118b768>] ? kfree+0x118/0x220
Nov  7 23:30:59 xen kernel: [   32.011688]  [<ffffffff81142b41>] ? free_hot_cold_page+0x111/0x180
Nov  7 23:30:59 xen kernel: [   32.012444]  [<ffffffff8118b768>] ? kfree+0x118/0x220
Nov  7 23:30:59 xen kernel: [   32.013203]  [<ffffffff810d8aaa>] ? load_module+0x20da/0x26b0
Nov  7 23:30:59 xen kernel: [   32.013962]  [<ffffffff810d46b0>] ? store_uevent+0x40/0x40
Nov  7 23:30:59 xen kernel: [   32.014714]  [<ffffffff810d91dd>] ? SyS_finit_module+0x7d/0xa0
Nov  7 23:30:59 xen kernel: [   32.015458]  [<ffffffff8150cc2d>] ? system_call_fast_compare_end+0x10/0x15
Nov  7 23:30:59 xen kernel: [   32.016196] Code: 83 c6 01 0f b6 4e ff 48 83 c2 01 84 c9 88 4a ff 75 ed f3 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 c7 01 0f b6 47 ff 48 83 c6 01 <3a> 46 ff 75 0f 84 c0 75 eb 31 c0 c3 0f 1f 84 00 00 00 00 00 19 
Nov  7 23:30:59 xen kernel: [   32.017803] RIP  [<ffffffff812ae67c>] strcmp+0xc/0x30
Nov  7 23:30:59 xen kernel: [   32.018585]  RSP <ffff88001d6efcf0>
Nov  7 23:30:59 xen kernel: [   32.019363] CR2: 0000000000000000
Nov  7 23:30:59 xen kernel: [   32.020156] ---[ end trace 68f28a553efa277a ]---

Reply to: