Bug#858125: e1000: ethernet interface hangs occasionally, kernel reports hang
- To: Ben Hutchings <ben@decadent.org.uk>
- Cc: 858125@bugs.debian.org
- Subject: Bug#858125: e1000: ethernet interface hangs occasionally, kernel reports hang
- From: "Bruce Momjian,,," <bruce@momjian.us>
- Date: Fri, 11 Aug 2017 18:43:37 -0400
- Message-id: <[🔎] 20170811224337.GU20241@momjian.us>
- Reply-to: "Bruce Momjian,,," <bruce@momjian.us>, 858125@bugs.debian.org
- In-reply-to: <1490150550.2626.2.camel@decadent.org.uk>
- References: <20170318160038.3889.71115.reportbug@momjian.us> <1489870930.2852.61.camel@decadent.org.uk> <20170318211050.GE20085@momjian.us> <20170318220653.GU4152@decadent.org.uk> <20170318220933.GG20085@momjian.us> <20170318223330.GA7289@momjian.us> <20170321200411.GA3179@momjian.us> <20170321223642.GA9174@momjian.us> <1490150550.2626.2.camel@decadent.org.uk> <20170318160038.3889.71115.reportbug@momjian.us>
I have determined that Debian was complaining about my ethernet port
because I had flow control enabled on the switch, and the switch was
getting easily overwhelmed and hanging, so the Debian resets were valid.
Thank you for the research on this. I think you can close this case.
---------------------------------------------------------------------------
On Wed, Mar 22, 2017 at 02:42:30AM +0000, Ben Hutchings wrote:
> Control: retitle -1 TX watchdog fires on e1000e interface with flow control enabled
>
> On Tue, 2017-03-21 at 18:36 -0400, Bruce Momjian,,, wrote:
> > On Tue, Mar 21, 2017 at 04:04:11PM -0400, Bruce Momjian,,, wrote:
> > > I think this proves my problems are related to flow control. How would
> > > you like to proceed? Is there a patch or change you would like me to
> > > test? Just close the ticket?
> > >
> > > I have a fix, but it is likely others would not know they had this
> > > problem unless they were monitoring their kernel logs or their network
> > > traffic for lag.
> >
> > Oh, I should also mention the port that is having problems is connected
> > to a NetGear GS108Ev3 switch, with current firmware, version 2.00.09.
> > The port connected to my Actiontec FIOS router is not having problems.
>
> I don't know about any specific bug, but if the switch sends flow
> control XOFF frames continually for long enough (usually 5 seconds)
> this will trigger the TX watchdog.
>
> It sounds like your switch implements flow control properly (some
> broken switches auto-negotiate it but actually flood flow control
> frames). However, if a device on some other port (that also has flow
> control enabled) sends XOFF frames continually *and* your server sends
> frames that should go to that other port, the switch will do the same
> to the server once the switch's internal queue has filled up.
>
> If the switch has port statistics including numbers of pause frames
> then you can see where they are coming from, but I think it doesn't.
> Without that information it's going to be hard to tell exactly where
> the fault lies.
>
> The e1000e driver *does* have statistics for pause frames transmitted
> and received (run: "ethtool -S eth0| grep flow_control"). If you log
> these every second then it should be possible to see what happens
> around the time the TX watchdog fires. That could provide some clues
> as to whether the NIC is behaving correctly.
>
> Ben.
>
> --
> Ben Hutchings
> Power corrupts. Absolute power is kind of neat.
> - John Lehman, Secretary of the US Navy
> 1981-1987
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Reply to: