[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#768478: marked as done (linux-image-3.16 (wheezy-backports and jessie): outbound TCP throughput drops to zero for several drivers)



Your message dated Mon, 15 Dec 2014 14:57:46 -0800
with message-id <CAJbdudWuDkm9C7S34JOBrGGh=H2Q+AoxU8M0zR+SDgOncKnxNw@mail.gmail.com>
and subject line This made it in to 3.16.7-ckt2-1 and the corresponding wheezy backport
has caused the Debian Bug report #768478,
regarding linux-image-3.16 (wheezy-backports and jessie): outbound TCP throughput drops to zero for several drivers
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
768478: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768478
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: src:linux
Version: 3.16
Severity: important
Tags: patch

Dear Kernel team,

There is a bug with TCP in kernel 3.16 described as:

"Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()

Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts."

Bug has been privately reported but we are following up with a BTS submission.  Google engineer has already submitted upstream: https://patchwork.ozlabs.org/patch/405110/

This bug is likely to surface userland, affects several drivers, and is sender-side only:

# git grep -n skb_orphan -- drivers/net
drivers/net/ethernet/chelsio/cxgb3/sge.c:1313:          skb_orphan(skb);
drivers/net/ethernet/chelsio/cxgb4/sge.c:1167:          skb_orphan(skb);
drivers/net/ethernet/chelsio/cxgb4vf/sge.c:1337:                skb_orphan(skb);
drivers/net/ethernet/sun/niu.c:6674:            skb_orphan(skb);
drivers/net/loopback.c:77:      skb_orphan(skb);
drivers/net/tun.c:789:  if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
drivers/net/tun.c:800:  skb_orphan(skb);
drivers/net/virtio_net.c:938:   skb_orphan(skb);
drivers/net/wireless/ath/wil6210/txrx.c:532:    skb_orphan(skb);
drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c:718:           skb_orphan(skb);
drivers/net/wireless/libertas/tx.c:156:         skb_orphan(skb);
drivers/net/wireless/mac80211_hwsim.c:992:      skb_orphan(skb);

Google engineer also states that backported patch for 3.16 or 3.17 kernel is much simpler :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4e4932b5079b..a8794367cd20 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2082,7 +2082,8 @@ static bool skb_still_in_host_queue(const struct sock *sk,
        const struct sk_buff *fclone = skb + 1;
 
        if (unlikely(skb->fclone == SKB_FCLONE_ORIG &&
-                    fclone->fclone == SKB_FCLONE_CLONE)) {
+                    fclone->fclone == SKB_FCLONE_CLONE &&
+                    fclone->sk == sk)) {
                NET_INC_STATS_BH(sock_net(sk),
                                 LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
                return true;

Understandably very bad timing but ideally this should be addressed in Jessie now versus a later backports update.

Thank you,
Eric

--- End Message ---
--- Begin Message ---
Thanks to Ben Hutchings for applying this fix and bringing the fix to my attention when I asked on IRC. changelog.Debian.gz confirms that the fix is included. Marking this bug as closed.

- Jimmy

--- End Message ---

Reply to: