[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: NFS disconnects periodically



Am Mittwoch, 8. Dezember 2004 13:07 schrieb Ralf Gesel|ensetter:
> I got the feeling that our tjener is to blame for peridical hangs I
> observed at thin clients, and work stations. LTSPserver reports
> someting like: NFS not reachable (time-out) / NFS connected ok.

Hi there,

this is an update of the current situation:
Today I was able to work fine on terminals (thin clients) for about 60 
minutes. Then, when pupils were to save their files, again and again 
the system hang. This was also the case in the neighbour lab where 
windows NT clients access the samba on tjener: When I clicked "Save as" 
and the fileselectbox was supposed to display the contents of some 
users' subfolder, all machines hung for about 10 seconds - as if you 
plug off their NICs.

Now I know a bit more to do further debuggings and beg you for 
assistance:

On Tjener:

ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0A:E4:0B:9E:7E
          inet addr:10.0.2.2  Bcast:10.0.3.255  Mask:255.255.254.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
*         RX packets:4913596 errors:4202 dropped:0 overruns:0 frame:4202
          TX packets:5567365 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
X         RX bytes:1336038573 (1.2 GiB)  TX bytes:4003539735 (3.7 GiB)
          Base address:0x4000 Memory:fcde0000-fce00000

On LTSP:

eth1      Protokoll:Ethernet  Hardware Adresse 00:0C:76:1A:91:15
          inet Adresse:10.0.2.10  Bcast:10.0.3.255  Maske:255.255.254.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
*         RX packets:1433622 errors:4384 dropped:0 overruns:0 frame:4384
          TX packets:1637151 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:1000
*         RX bytes:324446685 (309.4 MiB)  TX bytes:655561930 (625.1 MiB)
          Interrupt:20 Basisadresse:0xc400 Speicher:fe5fd000-fe5fd038


But as a matter of fact, the hangs concern also samba clients. 

Here is some interesting snippets from dmesg (no time stamps) that was 
created during net outage:

> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> usb.c: registered new driver usbdevfs
> usb.c: registered new driver hub
> usb-ohci.c: USB OHCI at membase 0xf9b04000, IRQ 11
> usb-ohci.c: usb-00:0f.2, ServerWorks CSB6 OHCI USB Controller
> usb.c: new USB bus registered, assigned bus number 1
> hub.c: USB hub found
> hub.c: 4 ports detected
> usb-uhci.c: $Revision: 1.275 $ time 19:31:15 Jun 16 2004
> usb-uhci.c: High bandwidth mode enabled
> usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
> uhci.c: USB Universal Host Controller Interface driver v1.1
> Real Time Clock Driver v1.10f
> usb-uhci.c: $Revision: 1.275 $ time 19:31:15 Jun 16 2004
> usb-uhci.c: High bandwidth mode enabled
> usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
> uhci.c: USB Universal Host Controller Interface driver v1.1
> Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> e1000: eth0 NIC Link is Down
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0 NIC Link is Down
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> fh_verify: no root_squashed access at bs-niben/.xsession-errors.
> request_module[net-pf-10]: waitpid(29297,...) failed, errno 512
> request_module[net-pf-10]: waitpid(4848,...) failed, errno 512
> request_module[net-pf-10]: waitpid(4528,...) failed, errno 512

I already did exchange the CAT cable.
And this is a snippet from syslog, also interecting with the outage:

> Dec 10 11:04:35 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:04:37 tjener imapd-ssl: Connection, ip=[::ffff:10.0.2.2]
> Dec 10 11:04:37 tjener imapd-ssl: LOGOUT, ip=[::ffff:10.0.2.2]
> Dec 10 11:04:53 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:04:56 tjener dhcpd: DHCPREQUEST for 10.0.3.240 from 
00:0c:76:3d:eb:86 (bib-50) via eth0
> Dec 10 11:04:56 tjener dhcpd: DHCPACK on 10.0.3.240 to 
00:0c:76:3d:eb:86 (bib-50) via eth0
> Dec 10 11:04:56 tjener named[473]: client 10.0.3.240#1452: update 
'0.10.in-addr.arpa/IN' denied
> Dec 10 11:04:57 tjener named[473]: client 10.0.3.240#1459: update 
'0.10.in-addr.arpa/IN' denied
> Dec 10 11:05:01 tjener /USR/SBIN/CRON[29371]: (nobody) CMD 
(/usr/bin/wget --proxy=off --output-document=- 
http://www.skolelinux.de/G
> Dec 10 11:05:01 tjener /USR/SBIN/CRON[29372]: (root) CMD (test 
-x /usr/lib/sysstat/sa1 && /usr/lib/sysstat/sa1)
> Dec 10 11:05:01 tjener /USR/SBIN/CRON[29374]: (root) CMD (if 
[ -x /usr/bin/filehandle_ctl.sh ] ; then /usr/bin/filehandle_ctl.sh ; f
> Dec 10 11:05:01 ltspserver00 /USR/SBIN/CRON[641]: (root) CMD (test 
-x /usr/lib/sysstat/sa1 && /usr/lib/sysstat/sa1)
> Dec 10 11:05:02 ltspserver00 /USR/SBIN/CRON[642]: (root) CMD (if 
[ -x /usr/bin/filehandle_ctl.sh ] ; then /usr/bin/filehandle_ctl.sh
> Dec 10 11:05:02 dhcp136 /USR/SBIN/CRON[1839]: (root) CMD (test 
-x /usr/lib/sysstat/sa1 && /usr/lib/sysstat/sa1)
> Dec 10 11:05:02 dhcp136 /USR/SBIN/CRON[1840]: (root) CMD (if 
[ -x /usr/bin/filehandle_ctl.sh ] ; then /usr/bin/filehandle_ctl.sh ; f
> Dec 10 11:05:44 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener OK
> 

Thanks in advance,
regards
Ralf

P.S.: Now from kern.log on tjener some lines to show the frenquency:

> Dec 10 09:36:38 dhcp136 kernel: agpgart: AGP aperture is 64M @ 
0xf8000000
> Dec 10 10:18:24 dhcp136 kernel: nfs: server tjener not responding, 
still trying
> Dec 10 10:18:25 dhcp136 kernel: nfs: server tjener OK
> Dec 10 10:27:04 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 10:27:05 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 10:48:59 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 10:49:04 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:00:45 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:01:14 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:04:35 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:04:53 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:05:44 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:06:31 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:06:37 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:07:06 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:07:47 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:08:18 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:08:18 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:08:54 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:08:54 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:08:58 dhcp136 kernel: nfs: server tjener not responding, 
still trying
> Dec 10 11:09:22 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:09:22 dhcp136 kernel: nfs: server tjener OK
> Dec 10 11:09:22 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:09:22 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:10:55 dhcp136 kernel: nfs: server tjener not responding, 
still trying
> Dec 10 11:11:22 dhcp136 kernel: nfs: server tjener not responding, 
still trying
> Dec 10 11:11:22 dhcp136 kernel: nfs: server tjener OK
> Dec 10 11:11:46 dhcp136 kernel: nfs: server tjener OK
> Dec 10 11:12:10 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:12:31 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:12:31 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:13:01 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:13:08 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:13:34 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:13:34 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:14:20 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:14:20 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:14:34 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:14:34 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:15:01 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:15:12 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:15:24 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:15:41 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:16:31 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:17:53 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:18:04 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:18:04 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:18:11 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:19:16 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:19:16 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:19:31 ltspserver00 kernel: nfs: server tjener not 
responding, still trying
> Dec 10 11:19:33 ltspserver00 kernel: nfs: server tjener OK
> 
> 



Reply to: