[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#642911: TX watchdog timeout on RTL8168B



Ben Hutchings wrote:

> A backport to 3.0.y is good enough for Debian's purposes since squeeze
> now has r8169 from 3.0.2.

Oh!  Thanks for the hint.

Here's a rough backport of the patch to Greg's linux-3.0.y branch and
split into 4 pieces, with an explanation involving some guesswork for
each.  I suspect the combination will work, but I easily could have
misunderstood some of the patches, so at least some details of the
explanations are probably wrong.

I would be particularly interested to hear if the patches seem sane
and work and if the effect of patch 4/4 is noticeable.

The patches seem to apply to Debian squeeze without trouble, too, in
case that's easier to test.

Thoughts of all kinds welcome, as always.

  r8169: ignore masked events in the irq handler
  r8169: mask RxFIFOOver interruption for 8168c, 8105e, and newer chips
  r8169: remove now-redundant case statements in rtl8169_interrupt
  r8169: use event mask when disabling interrupts

 drivers/net/r8169.c |   54 ++++++++++++++++++++++----------------------------
 1 files changed, 24 insertions(+), 30 deletions(-)
From: Jonathan Nieder <jrnieder@gmail.com>
Subject: r8169: ignore masked events in the irq handler

[ Extracted from commit 811fd3010cf512f2e23e6c4c912aad54516dc706
  upstream, which was written by François Romieu. ]

As commit f60ac8e7ab7c ("r8169: prevent RxFIFO induced loops in the
irq handler") explains, while the RxFIFOOver interruption is masked
for most 8168, nothing prevents it from appearing in the irq status
word.  Enforce a no-processing policy for such events by forgetting
them before we have a chance to consider doing anything about them
(which tends to have unpleasant results like making the chip lock
up).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 drivers/net/r8169.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 5f838ef92494..2f66650bf69c 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -5001,6 +5001,10 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	 */
 	status = RTL_R16(IntrStatus);
 	while (status && status != 0xffff) {
+		status &= tp->intr_event;
+		if (!status)
+			break;
+
 		handled = 1;
 
 		/* Handle all of the error cases first. These will reset
-- 
1.7.9

From: Jonathan Nieder <jrnieder@gmail.com>
Subject: r8169: mask RxFIFOOver interruption for 8168c, 8105e, and newer chips

[ Extracted from commit 811fd3010cf512f2e23e6c4c912aad54516dc706
  upstream, which was written by François Romieu. ]

Realtek has specified that the post 8168c gigabit chips and the post
8105e fast ethernet chips recover automatically from a Rx FIFO overflow.
The driver does not need to clear the RxFIFOOver bit of IntrStatus and
it should rather avoid messing it.

RxFIFOOver was already ignored in rtl_cfg_infos[RTL_CFG_1] for the
whole 8168 line of chips with two exceptions:
- RTL_GIGA_MAC_VER_22 since b5ba6d12bdac21bc0620a5089e0f24e362645efd
  ("use RxFIFO overflow workaround for 8168c chipset.").
  This one should now be correctly handled.
- RTL_GIGA_MAC_VER_11 (8168b) which requires a different Rx FIFO
  overflow processing.

Though it does not conform to Realtek suggestion above, the updated
driver includes no change for RTL_GIGA_MAC_VER_12 and RTL_GIGA_MAC_VER_17.
Both are 8168b. RTL_GIGA_MAC_VER_12 is common and a bit old so I'd rather
wait for experimental evidence that the change suggested by Realtek really
helps or does not hurt in unexpected ways.

RxFIFOOver is masked for post 8105e 810x chips, namely the sole 8105e
(RTL_GIGA_MAC_VER_30) itself.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/r8169.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 2f66650bf69c..1b69f92363df 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4093,8 +4093,7 @@ static void rtl_hw_start_8168(struct net_device *dev)
 	RTL_W16(IntrMitigate, 0x5151);
 
 	/* Work around for RxFIFO overflow. */
-	if (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
-	    tp->mac_version == RTL_GIGA_MAC_VER_22) {
+	if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
 		tp->intr_event |= RxFIFOOver | PCSTimeout;
 		tp->intr_event &= ~RxOverflow;
 	}
@@ -4276,6 +4275,11 @@ static void rtl_hw_start_8101(struct net_device *dev)
 	void __iomem *ioaddr = tp->mmio_addr;
 	struct pci_dev *pdev = tp->pci_dev;
 
+	if (tp->mac_version >= RTL_GIGA_MAC_VER_30) {
+		tp->intr_event &= ~RxFIFOOver;
+		tp->napi_event &= ~RxFIFOOver;
+	}
+
 	if (tp->mac_version == RTL_GIGA_MAC_VER_13 ||
 	    tp->mac_version == RTL_GIGA_MAC_VER_16) {
 		int cap = tp->pcie_cap;
-- 
1.7.9

From: Jonathan Nieder <jrnieder@gmail.com>
Subject: r8169: remove now-redundant case statements in rtl8169_interrupt

[ Extracted from commit 811fd3010cf512f2e23e6c4c912aad54516dc706
  upstream, which was written by François Romieu. ]

After the patch "r8169: ignore masked events in the irq handler", this
code path should never be reached on these chips.

Removed case statements in rtl8169_interrupt are only 8168 relevant.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 drivers/net/r8169.c |   18 ------------------
 1 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 1b69f92363df..01fff010fbf8 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -5023,27 +5023,9 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 			switch (tp->mac_version) {
 			/* Work around for rx fifo overflow */
 			case RTL_GIGA_MAC_VER_11:
-			case RTL_GIGA_MAC_VER_22:
-			case RTL_GIGA_MAC_VER_26:
 				netif_stop_queue(dev);
 				rtl8169_tx_timeout(dev);
 				goto done;
-			/* Testers needed. */
-			case RTL_GIGA_MAC_VER_17:
-			case RTL_GIGA_MAC_VER_19:
-			case RTL_GIGA_MAC_VER_20:
-			case RTL_GIGA_MAC_VER_21:
-			case RTL_GIGA_MAC_VER_23:
-			case RTL_GIGA_MAC_VER_24:
-			case RTL_GIGA_MAC_VER_27:
-			case RTL_GIGA_MAC_VER_28:
-			case RTL_GIGA_MAC_VER_31:
-			/* Experimental science. Pktgen proof. */
-			case RTL_GIGA_MAC_VER_12:
-			case RTL_GIGA_MAC_VER_25:
-				if (status == RxFIFOOver)
-					goto done;
-				break;
 			default:
 				break;
 			}
-- 
1.7.9

From: Jonathan Nieder <jrnieder@gmail.com>
Subject: r8169: use event mask when disabling interrupts

[ Extracted from commit 811fd3010cf512f2e23e6c4c912aad54516dc706
  upstream, which was written by François Romieu. ]

Not sure what this one is about, to be honest.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 drivers/net/r8169.c |   24 ++++++++++++++----------
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 01fff010fbf8..c15a84eac2e5 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1043,17 +1043,21 @@ static u8 rtl8168d_efuse_read(void __iomem *ioaddr, int reg_addr)
 	return value;
 }
 
-static void rtl8169_irq_mask_and_ack(void __iomem *ioaddr)
+static void rtl8169_irq_mask_and_ack(struct rtl8169_private *tp)
 {
+	void __iomem *ioaddr = tp->mmio_addr;
+
 	RTL_W16(IntrMask, 0x0000);
-
-	RTL_W16(IntrStatus, 0xffff);
+	RTL_W16(IntrStatus, tp->intr_event);
+	RTL_R8(ChipCmd);
 }
 
-static void rtl8169_asic_down(void __iomem *ioaddr)
+static void rtl8169_asic_down(struct rtl8169_private *tp)
 {
+	void __iomem *ioaddr = tp->mmio_addr;
+
 	RTL_W8(ChipCmd, 0x00);
-	rtl8169_irq_mask_and_ack(ioaddr);
+	rtl8169_irq_mask_and_ack(tp);
 	RTL_R16(CPlusCmd);
 }
 
@@ -3611,7 +3615,7 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 	void __iomem *ioaddr = tp->mmio_addr;
 
 	/* Disable interrupts */
-	rtl8169_irq_mask_and_ack(ioaddr);
+	rtl8169_irq_mask_and_ack(tp);
 
 	if (tp->mac_version == RTL_GIGA_MAC_VER_27 ||
 	    tp->mac_version == RTL_GIGA_MAC_VER_28 ||
@@ -4543,7 +4547,7 @@ static void rtl8169_wait_for_quiescence(struct net_device *dev)
 	/* Wait for any pending NAPI task to complete */
 	napi_disable(&tp->napi);
 
-	rtl8169_irq_mask_and_ack(ioaddr);
+	rtl8169_irq_mask_and_ack(tp);
 
 	tp->intr_mask = 0xffff;
 	RTL_W16(IntrMask, tp->intr_event);
@@ -5015,7 +5019,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		 * the chip, so just exit the loop.
 		 */
 		if (unlikely(!netif_running(dev))) {
-			rtl8169_asic_down(ioaddr);
+			rtl8169_asic_down(tp);
 			break;
 		}
 
@@ -5120,7 +5124,7 @@ static void rtl8169_down(struct net_device *dev)
 
 	spin_lock_irq(&tp->lock);
 
-	rtl8169_asic_down(ioaddr);
+	rtl8169_asic_down(tp);
 	/*
 	 * At this point device interrupts can not be enabled in any function,
 	 * as netif_running is not true (rtl8169_interrupt, rtl8169_reset_task,
@@ -5374,7 +5378,7 @@ static void rtl_shutdown(struct pci_dev *pdev)
 
 	spin_lock_irq(&tp->lock);
 
-	rtl8169_asic_down(ioaddr);
+	rtl8169_asic_down(tp);
 
 	spin_unlock_irq(&tp->lock);
 
-- 
1.7.9


Reply to: