ssh problems...
I am running slink on a cluster of LX164's. My uptime is good, but I seem
to get a problem with ssh. It is almost like clock work. A simple reboot
fixes everything, but I would like to know what is doing on.
I am using ssh 1.2.26-1.2 (I would upgrade to 27 but it is in the unstable
tree, does anyone have it built for slink?). Here is the output when
ssh'n to a machine that does not work:
[ 15:22:08 : root@kucalc : /usr/local/src ]
(44) # ssh -v compute9
SSH Version 1.2.26 [alpha-unknown-linux], protocol version 1.5.
Standard version. Does not use RSAREF.
kucalc: Reading configuration data /etc/ssh/ssh_config
kucalc: Applying options for *
kucalc: ssh_connect: getuid 0 geteuid 0 anon 0
kucalc: Connecting to compute9 [192.168.1.10] port 22.
kucalc: Allocated local port 1017.
kucalc: Connection established.
(hang here)
As you can see the destination never responds. So I tried rpcinfo in an
attepmt to get some help. When I run rpcinfo -p compute9 the program
never returns. So could this be some portmap problem?
All machines are using 2.2.1. BTW nfs is working on all the machines?
Here is the ps for two nodes, one that is working (i.e. ssh and rpcinfo
are successful on the host):
First the working node:
USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
daemon 104 0.0 0.5 3136 656 ? S Aug 6 0:00 /sbin/portmap
daemon 139 0.0 0.5 3136 728 ? S Aug 6 0:00 /usr/sbin/atd
root 1 0.0 0.5 2064 696 ? S Aug 6 0:05 init
root 2 0.0 0.0 0 0 ? SW Aug 6 0:00 (kflushd)
root 3 0.0 0.0 0 0 ? SW Aug 6 0:00 (kswapd)
root 12 0.0 0.3 2008 448 ? S Aug 6 0:00 update
root 97 0.0 0.7 5352 944 ? S Aug 6 0:00 /sbin/syslogd
root 99 0.0 0.8 2448 1088 ? S Aug 6 0:00 /sbin/klogd
root 106 0.0 0.7 5296 912 ? S Aug 6 0:00 /usr/sbin/inetd
root 111 0.0 0.5 2072 640 ? S Aug 6 0:00 /usr/sbin/gpm
-m /devroot 114 0.0 0.9 7904 1152 ? S Aug 6 0:04
/usr/sbin/sshd
root 117 0.0 1.9 2424 2424 ? S Aug 6 0:00 /usr/sbin/xntpd
root 132 0.0 0.9 7568 1208 ? S Aug 6 0:00 /usr/sbin/amd
-l syslroot 136 0.0 0.0 0 0 ? SW Aug 6 0:00 (rpciod)
root 137 0.0 0.0 0 0 ? SW Aug 6 0:00 (lockd)
root 142 0.0 0.6 3152 800 ? S Aug 6 0:00 /usr/sbin/cron
root 146 0.0 0.5 3120 728 2 S Aug 6 0:00 /sbin/getty
38400 ttyroot 147 0.0 0.5 3120 728 3 S Aug 6 0:00
/sbin/getty 38400 ttyroot 148 0.0 0.5 3120 728 4 S Aug 6
0:00 /sbin/getty 38400 ttyroot 149 0.0 0.5 3120 728 S0 S Aug
6 0:00 /sbin/getty -L ttyS0
sauter 188 0.0 1.5 7616 1968 1 S Aug 6 0:00 -bash
sauter 631 99.9 0.5 3088 688 ? R 01:38 825:25
/users/sauter/csource
sauter 1306 0.0 0.6 4224 832 1 R 15:24 0:00 ps aux
And the non-working node:
USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
daemon 98 0.0 0.0 3136 0 ? SW Jun 20 0:00 (portmap)
daemon 133 0.0 0.1 3136 168 ? S Jun 20 0:00 /usr/sbin/atd
root 1 0.0 0.1 2064 144 ? S Jun 20 0:05 init
root 2 0.0 0.0 0 0 ? SW Jun 20 0:00 (kflushd)
root 3 0.0 0.0 0 0 ? SW Jun 20 0:01 (kswapd)
root 12 0.0 0.0 2008 48 ? S Jun 20 0:01 update
root 91 0.0 0.4 5352 568 ? S Jun 20 0:05 /sbin/syslogd
root 93 0.0 0.2 2448 264 ? S Jun 20 0:00 /sbin/klogd
root 100 0.0 0.0 5296 0 ? SW Jun 20 0:00 (inetd)
root 105 0.0 0.0 2072 64 ? S Jun 20 0:00 /usr/sbin/gpm
-m /devroot 108 0.0 0.5 7984 680 ? S Jun 20 0:57
/usr/sbin/sshd
root 126 0.0 0.5 7568 640 ? S Jun 20 0:01 /usr/sbin/amd
-l syslroot 130 0.0 0.0 0 0 ? SW Jun 20 0:00 (rpciod)
root 131 0.0 0.0 0 0 ? SW Jun 20 0:00 (lockd)
root 136 0.0 0.2 3152 280 ? S Jun 20 0:00 /usr/sbin/cron
root 140 0.0 0.0 3120 0 2 SW Jun 20 0:00 (getty)
root 141 0.0 0.0 3120 0 3 SW Jun 20 0:00 (getty)
root 142 0.0 0.0 3120 0 4 SW Jun 20 0:00 (getty)
root 143 0.0 0.0 3120 0 S0 SW Jun 20 0:00 (getty)
root 2408 0.0 1.9 2424 2424 ? S Aug 1 0:00 /usr/sbin/xntpd
sauter 139 0.0 1.5 7616 1960 1 S Jun 20 0:00 -bash
sauter 6742 99.0 0.5 3088 688 ? R 01:39 818:10
/users/sauter/csource
sauter 7378 0.0 0.6 4224 832 1 R 15:25 0:00 ps aux
Thanks!
Lon
---
Lonnie Sauter
Department of Mathematics http://www.math.ukans.edu/~sauter
651 Snow Hall (785)864-3913 Office
University of Kansas (785)864-5255 Fax
Lawrence, KS 66045
---
Reply to: