[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#767261: [Pkg-xen-devel] Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down



On Sat, 2014-11-08 at 00:40 -0500, Gedalya wrote:
> On 11/07/2014 03:25 AM, Ian Campbell wrote:
> > On Thu, 2014-11-06 at 11:06 -0500, Gedalya wrote:
> >>> I suspect we will need to backport some xen-netback patch or other. I've
> >>> put some feelers out to see if any of the upstream devs have any
> >>> hints...
> >> OK so if it's just a matter of changing a kernel on one box, I can
> >> perhaps try to build a 3.18 this weekend
> > I think these commits, which are in v3.18-rc3, are probably the ones:
> >
> > ecf08d2 xen-netback: reintroduce guest Rx stall detection
> > f48da8b xen-netback: fix unlimited guest Rx internal queue and carrier flapping
> > bc96f64 xen-netback: make feature-rx-notify mandatory
> >
> > I'll investigate a backport/check if they are destined for stable@.
> >
> > Ian.
> >
> Tried to just frankenport xen-netback from 3.18 into 3.16, didn't work 
> very well ;-)

Did you backport just the above or the full set of changes from 3.18?

> I'm running 3.18rc3+ now. Bombarding the downed interface by 
> broadcast-pinging the network it's on causes the following
> [  281.396014] vif vif-3-0 vif3.0: Guest Rx stalled
> [  281.396080] breth1: port 3(vif3.0) entered disabled state
> and that's it. This is instead of the previously repeated 'draining TX 
> queue' messages.
> Let's assume it won't crash, I'll let you know if this assumption turns 
> out to be wrong.
> 
> I'm kind of curious why this is preceded by
> [   46.232475] vif vif-3-0 vif3.0: Guest Rx ready
> [   46.232514] IPv6: ADDRCONF(NETDEV_CHANGE): vif3.0: link becomes ready
> And the host figures out it's down only when traffic comes and doesn't 
> get through.
> I guess this might change if I run 3.18 in the guest too?

I *think* this is the intended behaviour of "xen-netback: reintroduce
guest Rx stall detection", since the interface is down on the guest side
it becomes considered stalled (i.e not processing any packets).

The "link becomes ready" message I think refers to the backend end of
the connection, it's like a network cable only plugged in at one end or
something. Perhaps things could be smarter, but that would be an
upstream thing I think.


Reply to: