Re: Network Performance Degrading over random amount of time
Hi
On Sat, Apr 26, 2014 at 01:01:25PM +0200, h@xx0r.eu wrote:
> Am 2014-04-26 12:44, schrieb h@xx0r.eu:
> >Am 2014-04-22 10:38, schrieb h@xx0r.eu:
> >>Am 2014-04-20 23:49, schrieb Karl E. Jorgensen:
> >>>Hi
> >>>
> >>>On Sun, Apr 20, 2014 at 01:01:53PM +0200, h@xx0r.eu wrote:
> >>>>Hi List,
> >>>>maybe you have a clue about the issues im having since
> >>>>several months.
> >>>>My Homeserver is running Debian Jessy right now, the network issues
> >>>>where there with wheezy aswell.
> >>>>after a fresh boot my network behaves like it should archiving near
> >>>>gbit speeds which is nice, after a random amount of uptime though my
> >>>>throughput degrades below 100mbit network speeds (about ~3.5MB/s)
> >>>>i measured using iperf.
> >>>
> >>>You don't explicitly say... Does a reboot "cure" the problem
> >>>(temporarily?)
> >>
> >>Yep thats exactly what a reboot does for me, i tenad to reboot about
> >>once every 2-3 days because of this issue, not something you would
> >>expect from a unix OS :D
> >>
> >>>
> >>>If so, does a "ifdown eth0"[1] + "ifup eth0" have the same
> >>>effect? (if
> >>>necessary: Unplug and re-plug the cable between "ifdown" and
> >>>"ifup"...) [A full reboot is a bit like a sledge hammer... very
> >>>crude]
> >>
> >>I have yet to try this, will report back when i have the performance
> >>problem again and try it
> >>
> >
> >Just got the chance to try, and yes, an ifdown eth0 -> Cable replug ->
> >ifup eth0 also cures this problem
Sounds good.
Is a cable replug _necessary_ to cure it? If it can be "cured" (or at
least worked around) with ifdown/ifup on it's own, (possibly with
rmmod/modprobe of relevant kernel modules in between), then you at
least have a scriptable workaround.
> >>>Anything in the kernel message log? (e.g. output of "dmesg" or
> >>>/var/log/kern.log) It would be interesting if the kernel spat
> >>>out some
> >>>messages around the time of the degradation... E.g. link-level
> >>>renegotiation or similar.
> >>>
> >>>Also: Anything interesting in the output of "ifconfig eth0" ? I'm
> >>>particularly interested in the counters for errors, dropped,
> >>>overruns,
> >>>frame/carrier counts: These counters may show interesting changes
> >>>around the time of the degradation...
> >>>
> >>
> >>I will write this down for next performance degration aswell
> >>Output of dmesg looks a bit suspicious:
> >
> >[40886.039833] irq 16: nobody cared (try booting with the
> >"irqpoll" option)
ooh. Interesting. If you're on wheezy, use "dmesg --ctime" or "dmesg
-T" to get human-readable timestamps. (or just check
/var/log/kern.log)
[snipped most of kernel output]
..
> >[40886.040506] Disabling IRQ #16
> >
> >
> >IRQ16 is related to eth0 according to /proc/interrupts:
> >
> >16: 3164992 3462922 0 0 IO-APIC-fasteoi
> >pata_via, eth0
Yes - it would be an amazing coincidence if it is not related.
> >Output of ifconfig looks unsuspicious, a few dropped packets but
> >nothing major:
> >
> >eth0 Link encap:Ethernet Hardware Adresse 00:0e:0c:b9:5e:1d
> > inet Adresse:192.168.1.20 Bcast:192.168.1.255
> >Maske:255.255.255.0
> > inet6-Adresse: fda3:32bd:abab:0:20e:cff:feb9:5e1d/64
> >Gültigkeitsbereich:Global
> > inet6-Adresse: fe80::20e:cff:feb9:5e1d/64
> >Gültigkeitsbereich:Verbindung
> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metrik:1
> > RX packets:11881446 errors:0 dropped:882 overruns:0 frame:0
> > TX packets:29392900 errors:0 dropped:0 overruns:0 carrier:0
> > Kollisionen:0 Sendewarteschlangenlänge:1000
> > RX bytes:7149517599 (6.6 GiB) TX bytes:69090488843
> >(64.3 GiB)
Oh - German :-) Interesting that it is only partly i18n'd. I don't
think "errors" is correct German? Not "fehler"? (I guess
you would know for sure, I'm only a Dane with rusty German skills...)
I wouldn't be surprised if the dropped packets are a result of the
cable un-plug/re-plug (assuming the output is from after the cable
play).
> Conclusion:
> With all this information i was able to track the root case of my
> issue down on my own, i guess im screwed since my asus board uses
> "PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge
> (rev 01)" which is commonly known to have a BUG regarding the
> handling of Interupts on the PCI bus..
ouch. I'm no PCI expert... If the bug only affects *some* interrupt
numbers, it may be possible to force the card/kernel module to use a
different IRQ? I'm thinking kernel module options and/or BIOS
settings?
> 2 options for me now: Switch to a much more expensive pcie gbit
> card, or buy an even more expensive new mainbord...
Perhaps a BIOS/firmware upgrade is possible?
> Well... Fuck
Surely there are more suitable German expletives here? But I get the
sentiment :-)
Regards
--
Karl E. Jorgensen
Reply to: