[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Any way to tell where the network problem is?



On Tue, 14 Feb 2012 13:35:45 -0800, Ross Boylan wrote:

> On Tue, 2012-02-14 at 20:55 +0000, Camaleón wrote:

(...)

>> >         Feb  8 19:45:40 corn kernel: [1987612.981170] ethfast:
>> >         Detected Tx Unit Hang:
>> 
>> (...)
>> 
>> >         Feb  8 19:45:49 corn kernel: [1987622.027816] NETDEV
>> >         WATCHDOG: ethfast: transmit timed out Feb  8 19:45:52 corn
>> >         kernel: [1987624.923313] ethfast: Link is Up 1000 Mbps Full
>> >         Duplex, Flow Control: RX/TX
>> 
>> By reading the logs, I can point you to these two bugs:
>> 
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518182
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=657689

> Thank you.  I do not have > 4g RAM, but my recent network upgrades took
> me from 100Mb/s to 1000Mb/s, so the load has definitely gone up.  There
> were problems before, which may or may not have the same cause.

Well, another user report the same error with 2 GiB of RAM (message #60), 
it can be also related to your problem. Anyway, when this happens, can 
you see a kernel trace/oops at your "/var/log/syslog"?

>> As you are using lenny,

> Yes.

>> I would try with an updated kernel (2.6.32) from backports or better
>> yet, take this as an opportunity to upgrade to Squeeze or another
>> supported version :-)

> I want to upgrade, but need to test it and fix my mail first...

You can try to load a LiveCD with an updated kernel and check if the 
network hang is also reproducible from there.
    
>> I see. Anyway, although the laptop is not at its bests, the logs are
>> concerning the linux box (the ethernet driver "hangs"). And one more
>> thing... "ethfast" looks like a 10/100 driver though it says "link up
>> 1000 Mbps". What kernel modules are you loading for both cards?

> lsmod shows e100 and e1000e.  I don't think I've done any customization
> related to these modules.  Here are some highlights from startup: 
> Jan 17 11:54:13 corn kernel: [    2.104915] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2 
> Jan 17 11:54:13 corn kernel: [    2.105673] e1000e: Copyright (c) 1999-2008 Intel Corporation. 
> Jan 17 11:54:13 corn kernel: [    2.105759] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16 
> Jan 17 11:54:13 corn kernel: [    2.106703] PCI: Setting latency timer of device 0000:02:00.0 to 64 
> Jan 17 11:54:13 corn kernel: [    2.205678] No dock devices found. 
> Jan 17 11:54:13 corn kernel: [    2.228257] eth0: (PCI Express:2.5GB/s:Width x1) 00:13:20:b7:23:53 
> Jan 17 11:54:13 corn kernel: [    2.229019] eth0: Intel(R) PRO/1000 Network Connection 
> Jan 17 11:54:13 corn kernel: [    2.229807] eth0: MAC: 2, PHY: 2, PBA No: ffffff-0ff 

(...)

> Jan 17 11:54:13 corn kernel: [    2.306212] e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI 
> Jan 17 11:54:13 corn kernel: [    2.339515] e100: Copyright(c) 1999-2006 Intel Corporation 
> Jan 17 11:54:13 corn kernel: [    2.383510] ACPI: PCI Interrupt 0000:05:01.0[A] -> GSI 22 (level, low) -> IRQ 22 
> Jan 17 11:54:13 corn kernel: [    2.431297] e100: eth1: e100_probe: addr 0x90028000, irq 22, MAC addr 00:...

Mmmm, it loads e1000e and e100 for the cards, which I think it's fine. I 
wonder what's the source for the above "ethfast" :-?
 
> Thank you so much for the diagnosis; the network problems have been
> driving me nuts, but the server is the last place I thought would be
> responsible.  Perhaps this also has something to do with fact that
> throughput has topped out at 300Mb/s, and that imposes a high CPU load
> on the laptop.

Another thing you can try is using a different method for doing the 
transfers, such as FTP or SSH. Samba can be cpu resource intensive and
I've also been in situations where transferring big amounts of data 
(>30 GiB) over a samba share from windows clients hung at the middle of 
the transfer.

Greetings,

-- 
Camaleón


Reply to: