Re: Install by NFS fails (while network works)
On Fri, Feb 05, 1999 at 01:24:42AM +0100, Loic Prylli wrote:
>
> Stephane Bortzmeyer writes:
> >
> > [No emergency, I finally installed from a CD and it worked.]
> >
> > Trying to install the base system with NFS, I get:
> >
> > mount RPC: timed out
> >
> > when I try to mount the disk from the shell (the installation program just
> > says there was an error).
> >
> > What is odd is that the network works (I can ping) and the the mount
> > apparently succeeded (here is what the NFS server says):
>
>
> Bonjour Stephane,
>
> This problem also occured to Oscar Levi recently, he suspected
> it was because the clocks of the nfs client and server was not
> synchronized.
>
> Well trying to reproduce it, I tend to confirm this is related to a
> wrong wallclock setting, but in a very subtle way (both RPC and NFS
> normally make no assumption about clock synchronisation between client and
> server):
> Here is what I suspect:
> - due to current bad handling of Alpha CMOS, we sometimes end up in
> year 1930, or some other year before 1970 which is represented by a
> negative integer.
> - RPC use the seconds returned by gettimeofday to construct an id
> - this id is generally stored in long variable, but goes through a ton
> of htonl, ntohl, 32bit cast, 64 bit cast, to the peer and back (while
> on the network it is 32 bit), and at the end it is used to do the
> matching with the peer replies. It generally works well
> because the id will generally be a positive value that fit into 31
> bit. But having a negative time will cause a negative id to be
> generated, which cause undefined behaviour on the upper 32 bit with
> ntohl (the behaviour is not the same when you compile with or without
> optimisation). So the final comparison should ignore the upper 32bit,
> which is not done currently.
> - this cause RPC based programs (and so mountd) to ignore any replies
> from a server, and then to timeout.
> - I have tried to cast to 32 bit before doing the comparisons in
> glibc/sunrpc/pmap_rmt.c and glibc/sunrpc/clnt_tcp.c. I am currently
> recompiling the libc to see if it solves the problem.
This is the kind of thing I, too, suspect. I am rebuilding the kernel
to verify that the correct time permits NFS mounts. I believe that I
was steered astray by someone on the kernel list who said that Linux
said (are you getting the picture) that NFS was broken in 2.0.35.
Now, I believe that the 2.0.35 NFS is fine and that this time problem
is what confounded me all along.
Builds take a LONG time on UDBs, so I suspect it will be days before
we have conclusive answers.
Thanks for the sincere effors.
Reply to: