[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Alpha (Multia) - NFS server



Marcus Williams wrote:

> last week. The server froze completely and had to be rebooted. On
> reboot their were no messages in the system logs that point to
> any nasty errors that could have caused the problem. I put this

I have this problem constantly on a box I want to use as a firewall.  If
you're not in X and watch the console (have to get there before the
screen blanks) you may have better luck.

I have two such machines - one runs RedHat, has all external SCSI
devices, and runs flawlessly.  Current uptime is 63 days, but it's been
much higher.  My other multia runs potato, has a single internal SCSI
device, and crashes constantly.

styx:/etc# ps aux
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root         1  0.0  2.0  2512  872 ?        S    11:54   0:01 init 
root         2  0.0  0.0     0    0 ?        SW   11:54   0:00 [kswapd]
root         3  0.0  0.0     0    0 ?        SW   11:54   0:00 [kflushd]
root         4  0.0  0.0     0    0 ?        SW   11:54   0:00 [kupdate]
root       100  0.0  2.8  6128 1216 ?        S    11:55   0:00
/sbin/syslogd
root       102  0.0  2.0  2560  880 ?        S    11:55   0:00
/sbin/klogd
root       108  0.0  1.7  2472  768 ?        S    11:55   0:00
/usr/sbin/inetd
root       121  0.0  3.9 12448 1696 ?        S    11:55   0:04
/usr/sbin/sshd
root       126  0.0  2.0  2480  856 tty1     S    11:55   0:00
/sbin/getty 38400
root       127  1.9  3.9  7472 1680 tty12    S    11:55   3:15 top -s
root       128  0.0  6.1 25648 2624 ?        S    12:03   0:01
/usr/sbin/sshd
root       129  0.0  4.9  8080 2104 pts/0    S    12:03   0:00 -bash
root       142  0.0  3.2  8064 1384 pts/0    R    14:42   0:01 ps aux

Not too much going on here.  The only script that runs enables
ip_forwarding, some redirection, and everything else is open.  This is
the script after removing the boilerplate (start|stop):

# (Dis)Allow forwarding
echo $FORWARD > /proc/sys/net/ipv4/ip_forward

# Set the default policy
iptables -P FORWARD $POLICY

# Only llama can make it out port 80 safely
iptables -$RULING PREROUTING -t nat -p tcp --dport 80 --src llama -j
ACCEPT

# All others get redirected to the squid on llama
iptables -$RULING PREROUTING -t nat -p tcp --dport 80 -j DNAT --to
$SQUID

# We can also redirect external requests into our cache
iptables -$RULING PREROUTING -t nat -p tcp --dport 3128 -j DNAT --to
$SQUID

# Masquerade all connections
iptables -$RULING POSTROUTING -t nat -j MASQUERADE


Network traffic is pretty light, and the machine will crash near eny
time; high, low, no load or network traffic.  The error, for me, is in
the SCSI driver, but sometimes also manifests itself on the PCI bus
(lost interrupt).

I've replaced nearly every component in the system with a third box I
have to no avail.  The third box was running RedHat flawlessly until the
power supply died.  Everything that was in there has been moved into the
existing box, with the exception of a floppy drive which was salvaged
for the currently running RedHat box, as the one that was in the
replacement was dead.

I've also played with the SCSI chain to no avail - with external
termination, without termination, without the bracket present at all. 
All had no effect on the crashing.

It this point it boils down to two potential problems:  The PCI ethernet
card, and the internal hard drive.  I'm currently running kernel
2.3.99-pre6.3 (2.3.99-pre5 is BROKEN) and have the same problems as with
the debian kernel, and my other custom kernels from 2.2.12 - 2.2.14.  I
believe the ethernet card should be functioning fine, as I think it was
used in one of these before.

So I guess the question is: Do you have an internal hard drive?

> After checking around on the kernel lists/NFS howtos etc it seems
> that the way forward might be to try the kernel nfs daemon, knfsd
> instead of the nfs-server package. I was wondering if anyone has
> had much success with this on an Alpha and whether its up to
> running on a production machine? Any other ideas would be
> appreciated..

I'm using NFS from RedHat 5.2 without incident.  Usage has been light to
non-existant as of late, but now that I have autofs working from NIS it
will be much higher.  They seem to be using rpc.nfsd as the nfs server -
"ps auwx | grep nfs" shows only it running, and "locate knfs" gives me a
single file from the kernel sources.

I'm running a custom 2.2.14 kernel on this box (same kernel as on the
box that crashes constantly, before the 2.3.99-pre6.3 upgrade).

Christopher


Reply to: