Re: Network Performance Degrading over random amount of time

To: h@xx0r.eu
Cc: debian-user@lists.debian.org, Karl E. Jørgensen <jorginator74@gmail.com>
Subject: Re: Network Performance Degrading over random amount of time
From: "Karl E. Jorgensen" <karl@jorgensen.org.uk>
Date: Sat, 26 Apr 2014 15:07:18 +0100
Message-id: <[🔎] 20140426140718.GA461@hawking>
In-reply-to: <[🔎] 9de4d662630831857583e4f811269982@xx0r.eu>
References: <[🔎] 640fc430144b16c990071c994976af66@xx0r.eu> <[🔎] 20140420214924.GA11030@hawking> <[🔎] 73dff043e137e48da6552aeb4ad5c2ea@xx0r.eu> <[🔎] ae52168ec3a4d89895029bc7eba76953@xx0r.eu> <[🔎] 9de4d662630831857583e4f811269982@xx0r.eu>

Hi

On Sat, Apr 26, 2014 at 01:01:25PM +0200, h@xx0r.eu wrote:
> Am 2014-04-26 12:44, schrieb h@xx0r.eu:
> >Am 2014-04-22 10:38, schrieb h@xx0r.eu:
> >>Am 2014-04-20 23:49, schrieb Karl E. Jorgensen:
> >>>Hi
> >>>
> >>>On Sun, Apr 20, 2014 at 01:01:53PM +0200, h@xx0r.eu wrote:
> >>>>Hi List,
> >>>>maybe you have a clue about the issues im having since
> >>>>several months.
> >>>>My Homeserver is running Debian Jessy right now, the network issues
> >>>>where there with wheezy aswell.
> >>>>after a fresh boot my network behaves like it should archiving near
> >>>>gbit speeds which is nice, after a random amount of uptime though my
> >>>>throughput degrades below 100mbit network speeds (about ~3.5MB/s)
> >>>>i measured using iperf.
> >>>
> >>>You don't explicitly say... Does a reboot "cure" the problem
> >>>(temporarily?)
> >>
> >>Yep thats exactly what a reboot does for me, i tenad to reboot about
> >>once every 2-3 days because of this issue, not something you would
> >>expect from a unix OS :D
> >>
> >>>
> >>>If so, does a "ifdown eth0"[1] + "ifup eth0" have the same
> >>>effect?  (if
> >>>necessary: Unplug and re-plug the cable between "ifdown" and
> >>>"ifup"...)  [A full reboot is a bit like a sledge hammer... very
> >>>crude]
> >>
> >>I have yet to try this, will report back when i have the performance
> >>problem again and try it
> >>
> >
> >Just got the chance to try, and yes, an ifdown eth0 -> Cable replug ->
> >ifup eth0 also cures this problem

Sounds good. 

Is a cable replug _necessary_ to cure it?  If it can be "cured" (or at
least worked around) with ifdown/ifup on it's own, (possibly with
rmmod/modprobe of relevant kernel modules in between), then you at
least have a scriptable workaround.

> >>>Anything in the kernel message log? (e.g. output of "dmesg" or
> >>>/var/log/kern.log) It would be interesting if the kernel spat
> >>>out some
> >>>messages around the time of the degradation...  E.g. link-level
> >>>renegotiation or similar.
> >>>
> >>>Also: Anything interesting in the output of "ifconfig eth0" ?  I'm
> >>>particularly interested in the counters for errors, dropped,
> >>>overruns,
> >>>frame/carrier counts: These counters may show interesting changes
> >>>around the time of the degradation...
> >>>
> >>
> >>I will write this down for next performance degration aswell
> >>Output of dmesg looks a bit suspicious:
> >
> >[40886.039833] irq 16: nobody cared (try booting with the
> >"irqpoll" option)

ooh. Interesting. If you're on wheezy, use "dmesg --ctime" or "dmesg
-T" to get human-readable timestamps. (or just check
/var/log/kern.log)

[snipped most of kernel output]
..
> >[40886.040506] Disabling IRQ #16
> >
> >
> >IRQ16 is related to eth0 according to /proc/interrupts:
> >
> >16:    3164992    3462922          0          0   IO-APIC-fasteoi
> >pata_via, eth0

Yes - it would be an amazing coincidence if it is not related.

> >Output of ifconfig looks unsuspicious, a few dropped packets but
> >nothing major:
> >
> >eth0      Link encap:Ethernet  Hardware Adresse 00:0e:0c:b9:5e:1d
> >          inet Adresse:192.168.1.20  Bcast:192.168.1.255
> >Maske:255.255.255.0
> >          inet6-Adresse: fda3:32bd:abab:0:20e:cff:feb9:5e1d/64
> >Gültigkeitsbereich:Global
> >          inet6-Adresse: fe80::20e:cff:feb9:5e1d/64
> >Gültigkeitsbereich:Verbindung
> >          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metrik:1
> >          RX packets:11881446 errors:0 dropped:882 overruns:0 frame:0
> >          TX packets:29392900 errors:0 dropped:0 overruns:0 carrier:0
> >          Kollisionen:0 Sendewarteschlangenlänge:1000
> >          RX bytes:7149517599 (6.6 GiB)  TX bytes:69090488843
> >(64.3 GiB)

Oh - German :-) Interesting that it is only partly i18n'd. I don't
think "errors" is correct German? Not "fehler"? (I guess
you would know for sure, I'm only a Dane with rusty German skills...)

I wouldn't be surprised if the dropped packets are a result of the
cable un-plug/re-plug (assuming the output is from after the cable
play).

> Conclusion:
> With all this information i was able to track the root case of my
> issue down on my own, i guess im screwed since my asus board uses
> "PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge
> (rev 01)" which is commonly known to have a BUG regarding the
> handling of Interupts on the PCI bus..

ouch.  I'm no PCI expert... If the bug only affects *some* interrupt
numbers, it may be possible to force the card/kernel module to use a
different IRQ?  I'm thinking kernel module options and/or BIOS
settings?

> 2 options for me now: Switch to a much more expensive pcie gbit
> card, or buy an even more expensive new mainbord...

Perhaps a BIOS/firmware upgrade is possible?

> Well... Fuck

Surely there are more suitable German expletives here? But I get the
sentiment :-)

Regards
-- 
Karl E. Jorgensen

Reply to:

References:
- Network Performance Degrading over random amount of time
  - From: h@xx0r.eu
- Re: Network Performance Degrading over random amount of time
  - From: "Karl E. Jorgensen" <karl@jorgensen.org.uk>
- Re: Network Performance Degrading over random amount of time
  - From: h@xx0r.eu
- Re: Network Performance Degrading over random amount of time
  - From: h@xx0r.eu
- Re: Network Performance Degrading over random amount of time
  - From: h@xx0r.eu

Prev by Date: Re: dist-upgrade installs new package, lists it as autoremove candidate
Next by Date: Re: Systemd
Previous by thread: Re: Network Performance Degrading over random amount of time
Next by thread: Re: Network Performance Degrading over random amount of time
Index(es):
- Date
- Thread