Re: NFS disconnects periodically
Am Mittwoch, 8. Dezember 2004 13:07 schrieb Ralf Gesel|ensetter:
> I got the feeling that our tjener is to blame for peridical hangs I
> observed at thin clients, and work stations. LTSPserver reports
> someting like: NFS not reachable (time-out) / NFS connected ok.
Hi there,
this is an update of the current situation:
Today I was able to work fine on terminals (thin clients) for about 60
minutes. Then, when pupils were to save their files, again and again
the system hang. This was also the case in the neighbour lab where
windows NT clients access the samba on tjener: When I clicked "Save as"
and the fileselectbox was supposed to display the contents of some
users' subfolder, all machines hung for about 10 seconds - as if you
plug off their NICs.
Now I know a bit more to do further debuggings and beg you for
assistance:
On Tjener:
ifconfig
eth0 Link encap:Ethernet HWaddr 00:0A:E4:0B:9E:7E
inet addr:10.0.2.2 Bcast:10.0.3.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
* RX packets:4913596 errors:4202 dropped:0 overruns:0 frame:4202
TX packets:5567365 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
X RX bytes:1336038573 (1.2 GiB) TX bytes:4003539735 (3.7 GiB)
Base address:0x4000 Memory:fcde0000-fce00000
On LTSP:
eth1 Protokoll:Ethernet Hardware Adresse 00:0C:76:1A:91:15
inet Adresse:10.0.2.10 Bcast:10.0.3.255 Maske:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
* RX packets:1433622 errors:4384 dropped:0 overruns:0 frame:4384
TX packets:1637151 errors:0 dropped:0 overruns:0 carrier:0
Kollisionen:0 Sendewarteschlangenlänge:1000
* RX bytes:324446685 (309.4 MiB) TX bytes:655561930 (625.1 MiB)
Interrupt:20 Basisadresse:0xc400 Speicher:fe5fd000-fe5fd038
But as a matter of fact, the hangs concern also samba clients.
Here is some interesting snippets from dmesg (no time stamps) that was
created during net outage:
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> usb.c: registered new driver usbdevfs
> usb.c: registered new driver hub
> usb-ohci.c: USB OHCI at membase 0xf9b04000, IRQ 11
> usb-ohci.c: usb-00:0f.2, ServerWorks CSB6 OHCI USB Controller
> usb.c: new USB bus registered, assigned bus number 1
> hub.c: USB hub found
> hub.c: 4 ports detected
> usb-uhci.c: $Revision: 1.275 $ time 19:31:15 Jun 16 2004
> usb-uhci.c: High bandwidth mode enabled
> usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
> uhci.c: USB Universal Host Controller Interface driver v1.1
> Real Time Clock Driver v1.10f
> usb-uhci.c: $Revision: 1.275 $ time 19:31:15 Jun 16 2004
> usb-uhci.c: High bandwidth mode enabled
> usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
> uhci.c: USB Universal Host Controller Interface driver v1.1
> Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> e1000: eth0 NIC Link is Down
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0 NIC Link is Down
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
> fh_verify: no root_squashed access at bs-niben/.xsession-errors.
> request_module[net-pf-10]: waitpid(29297,...) failed, errno 512
> request_module[net-pf-10]: waitpid(4848,...) failed, errno 512
> request_module[net-pf-10]: waitpid(4528,...) failed, errno 512
I already did exchange the CAT cable.
And this is a snippet from syslog, also interecting with the outage:
> Dec 10 11:04:35 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:04:37 tjener imapd-ssl: Connection, ip=[::ffff:10.0.2.2]
> Dec 10 11:04:37 tjener imapd-ssl: LOGOUT, ip=[::ffff:10.0.2.2]
> Dec 10 11:04:53 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:04:56 tjener dhcpd: DHCPREQUEST for 10.0.3.240 from
00:0c:76:3d:eb:86 (bib-50) via eth0
> Dec 10 11:04:56 tjener dhcpd: DHCPACK on 10.0.3.240 to
00:0c:76:3d:eb:86 (bib-50) via eth0
> Dec 10 11:04:56 tjener named[473]: client 10.0.3.240#1452: update
'0.10.in-addr.arpa/IN' denied
> Dec 10 11:04:57 tjener named[473]: client 10.0.3.240#1459: update
'0.10.in-addr.arpa/IN' denied
> Dec 10 11:05:01 tjener /USR/SBIN/CRON[29371]: (nobody) CMD
(/usr/bin/wget --proxy=off --output-document=-
http://www.skolelinux.de/G
> Dec 10 11:05:01 tjener /USR/SBIN/CRON[29372]: (root) CMD (test
-x /usr/lib/sysstat/sa1 && /usr/lib/sysstat/sa1)
> Dec 10 11:05:01 tjener /USR/SBIN/CRON[29374]: (root) CMD (if
[ -x /usr/bin/filehandle_ctl.sh ] ; then /usr/bin/filehandle_ctl.sh ; f
> Dec 10 11:05:01 ltspserver00 /USR/SBIN/CRON[641]: (root) CMD (test
-x /usr/lib/sysstat/sa1 && /usr/lib/sysstat/sa1)
> Dec 10 11:05:02 ltspserver00 /USR/SBIN/CRON[642]: (root) CMD (if
[ -x /usr/bin/filehandle_ctl.sh ] ; then /usr/bin/filehandle_ctl.sh
> Dec 10 11:05:02 dhcp136 /USR/SBIN/CRON[1839]: (root) CMD (test
-x /usr/lib/sysstat/sa1 && /usr/lib/sysstat/sa1)
> Dec 10 11:05:02 dhcp136 /USR/SBIN/CRON[1840]: (root) CMD (if
[ -x /usr/bin/filehandle_ctl.sh ] ; then /usr/bin/filehandle_ctl.sh ; f
> Dec 10 11:05:44 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener OK
>
Thanks in advance,
regards
Ralf
P.S.: Now from kern.log on tjener some lines to show the frenquency:
> Dec 10 09:36:38 dhcp136 kernel: agpgart: AGP aperture is 64M @
0xf8000000
> Dec 10 10:18:24 dhcp136 kernel: nfs: server tjener not responding,
still trying
> Dec 10 10:18:25 dhcp136 kernel: nfs: server tjener OK
> Dec 10 10:27:04 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 10:27:05 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 10:48:59 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 10:49:04 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:00:45 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:01:14 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:04:35 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:04:53 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:05:44 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:06:00 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:06:31 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:06:37 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:07:06 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:07:47 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:08:18 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:08:18 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:08:54 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:08:54 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:08:58 dhcp136 kernel: nfs: server tjener not responding,
still trying
> Dec 10 11:09:22 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:09:22 dhcp136 kernel: nfs: server tjener OK
> Dec 10 11:09:22 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:09:22 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:10:55 dhcp136 kernel: nfs: server tjener not responding,
still trying
> Dec 10 11:11:22 dhcp136 kernel: nfs: server tjener not responding,
still trying
> Dec 10 11:11:22 dhcp136 kernel: nfs: server tjener OK
> Dec 10 11:11:46 dhcp136 kernel: nfs: server tjener OK
> Dec 10 11:12:10 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:12:31 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:12:31 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:13:01 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:13:08 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:13:34 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:13:34 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:14:20 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:14:20 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:14:34 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:14:34 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:15:01 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:15:12 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:15:24 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:15:41 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:16:31 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:17:53 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:18:04 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:18:04 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:18:11 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:19:16 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:19:16 ltspserver00 kernel: nfs: server tjener OK
> Dec 10 11:19:31 ltspserver00 kernel: nfs: server tjener not
responding, still trying
> Dec 10 11:19:33 ltspserver00 kernel: nfs: server tjener OK
>
>
Reply to: