[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Any way to tell where the network problem is?



On Tue, 2012-02-14 at 20:55 +0000, Camaleón wrote:
> On Tue, 14 Feb 2012 12:14:50 -0800, Ross Boylan wrote:
> 
> > On Thu, 2012-02-09 at 14:51 +0000, Camaleón wrote:
> >> On Wed, 08 Feb 2012 22:44:45 -0800, Ross Boylan wrote:
> >> 
> >> > I've been losing network connections between my laptop and main
> >> > machine. The logs from the main machine are below.
> >> 
> >> I can't see them, neither attached nor linked :-?
> 
> > Thanks for your response.  I've been having some mail problems and only
> > just noticed it.
> > 
> > It's odd you don't see the logs; they show up in the archive.  I'll try
> > pasting them here:
> 
> Thanks! 
> 
> It has to be a problem with my newsreader (pan) that was not capable of 
> showing the logs. Weird.
> 
> >         Feb  8 19:45:40 corn kernel: [1987612.981170] ethfast: Detected Tx Unit Hang: 
> 
> (...)
> 
> >         Feb  8 19:45:49 corn kernel: [1987622.027816] NETDEV WATCHDOG: ethfast: transmit timed out
> >         Feb  8 19:45:52 corn kernel: [1987624.923313] ethfast: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> 
> By reading the logs, I can point you to these two bugs:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518182
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=657689
Thank you.  I do not have > 4g RAM, but my recent network upgrades took
me from 100Mb/s to 1000Mb/s, so the load has definitely gone up.  There
were problems before, which may or may not have the same cause.
> 
> As you are using lenny, 
Yes.
> I would try with an updated kernel (2.6.32) from 
> backports or better yet, take this as an opportunity to upgrade to 
> Squeeze or another supported version :-)
I want to upgrade, but need to test it and fix my mail first...
>         
> >> > Is there any way of telling from them if the network problem is
> >> > occurring on the local or remote (laptop) machine?
> >> 
> >> Ping from/to both machines and see the output.
> 
> > How will that tell me where the problem lies?  
> 
> Sure! I didn't see the logs, sorry. I thought you were having some sort 
> of disconnects from one of the computers.
> 
> > Here's what I get from the server while things are working OK: 
> > $ ping 192.168.40.30
> > PING 192.168.40.30 (192.168.40.30) 56(84) bytes of data. 64 bytes from
> > 192.168.40.30: icmp_seq=1 ttl=128 time=0.565 ms 64 bytes from
> > 192.168.40.30: icmp_seq=2 ttl=128 time=0.533 ms 
> > $ traceroute 192.168.40.30
> > traceroute to 192.168.40.30 (192.168.40.30), 30 hops max, 40 byte
> > packets
> >  1  cotton.betterworld.us (192.168.40.30)  24.460 ms * *
> > There aren't any intermediate steps so that I could see the packets
> > going part-way.
> 
> Mmm, okay. But are you pinging and tracerouting from/to the same host?
No.  That's the server pinging the laptop.
> 
> >> How are the computers connected, directly with a crossover network
> >> cable, using a switch, Internet (remote) connection...?
> 
> > Using a new D-Link Gigabit switch (Model DGS-1008G) and ethernet. I've
> > also tried wireless, which additionally uses a new D-Link Wireless N
> > router (Model DIR-601), i.e., laptop-> wireless -> switch -> server; the
> > laptop only has wireless G.  It's basically impossible to keep a good
> > connection up, though it works for awhile after I start up.  The failure
> > is not limited to SAMBA.  The laptop is definitely not in good shape.
> 
> I see. Anyway, although the laptop is not at its bests, the logs are 
> concerning the linux box (the ethernet driver "hangs"). And one more 
> thing... "ethfast" looks like a 10/100 driver though it says "link up 
> 1000 Mbps". What kernel modules are you loading for both cards?
lsmod shows e100 and e1000e.  I don't think I've done any customization
related to these modules.  Here are some highlights from startup:
Jan 17 11:54:13 corn kernel: [    2.104915] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2
Jan 17 11:54:13 corn kernel: [    2.105673] e1000e: Copyright (c) 1999-2008 Intel Corporation.
Jan 17 11:54:13 corn kernel: [    2.105759] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
Jan 17 11:54:13 corn kernel: [    2.106703] PCI: Setting latency timer of device 0000:02:00.0 to 64
Jan 17 11:54:13 corn kernel: [    2.205678] No dock devices found.
Jan 17 11:54:13 corn kernel: [    2.228257] eth0: (PCI Express:2.5GB/s:Width x1) 00:13:20:b7:23:53
Jan 17 11:54:13 corn kernel: [    2.229019] eth0: Intel(R) PRO/1000 Network Connection
Jan 17 11:54:13 corn kernel: [    2.229807] eth0: MAC: 2, PHY: 2, PBA No: ffffff-0ff
Jan 17 11:54:13 corn kernel: [    2.230472] usbcore: registered new interface driver usbfs
Jan 17 11:54:13 corn kernel: [    2.231188] usbcore: registered new interface driver hub
Jan 17 11:54:13 corn kernel: [    2.240972] SCSI subsystem initialized
Jan 17 11:54:13 corn kernel: [    2.262687] usbcore: registered new device driver usb
Jan 17 11:54:13 corn kernel: [    2.306059] libata version 3.00 loaded.
Jan 17 11:54:13 corn kernel: [    2.306212] e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
Jan 17 11:54:13 corn kernel: [    2.339515] e100: Copyright(c) 1999-2006 Intel Corporation
Jan 17 11:54:13 corn kernel: [    2.383510] ACPI: PCI Interrupt 0000:05:01.0[A] -> GSI 22 (level, low) -> IRQ 22
Jan 17 11:54:13 corn kernel: [    2.431297] e100: eth1: e100_probe: addr 0x90028000, irq 22, MAC addr 00:...

Thank you so much for the diagnosis; the network problems have been
driving me nuts, but the server is the last place I thought would be
responsible.  Perhaps this also has something to do with fact that
throughput has topped out at 300Mb/s, and that imposes a high CPU load
on the laptop.

Ross



Reply to: