[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Install by NFS fails (while network works)



On Fri, Feb 05, 1999 at 01:24:42AM +0100, Loic Prylli wrote:
> 
> Stephane Bortzmeyer writes:
>  > 
>  > [No emergency, I finally installed from a CD and it worked.]
>  > 
>  > Trying to install the base system with NFS, I get:
>  > 
>  > mount RPC: timed out
>  > 
> > when I try to mount the disk from the shell (the installation program just 
>  > says there was an error).
>  > 
>  > What is odd is that the network works (I can ping) and the the mount 
>  > apparently succeeded (here is what the NFS server says):
> 
> 
> Bonjour Stephane,
> 
> This  problem also occured to Oscar Levi recently, he suspected
> it was because the clocks of the nfs client and server was not
> synchronized.
> 
> Well trying to reproduce it, I tend to confirm this is related to a
> wrong wallclock setting, but in a very subtle way (both RPC and NFS
> normally make no assumption about clock synchronisation between client and
> server):
> Here is what I suspect:
> - due to current bad handling of Alpha CMOS, we sometimes end up in
> year 1930, or some other year before 1970 which is represented by a
> negative integer.  
> - RPC use the seconds returned by gettimeofday to construct an id
> - this id is generally stored in long variable, but goes through a ton 
> of htonl, ntohl, 32bit cast, 64 bit cast, to the peer and back (while
> on the network it is 32 bit), and at the end it is used to do the
> matching with the peer replies. It generally works well  
> because the id will generally be a positive value that fit into 31
> bit. But having a negative  time will cause a negative id to be
> generated, which cause undefined behaviour on the  upper 32 bit with
> ntohl (the behaviour is not the same when you compile with or without
> optimisation). So the final comparison should ignore the upper 32bit,
> which is not done currently.
> - this cause RPC based programs (and so mountd) to ignore any replies
> from a server, and then to timeout. 
> - I have tried to cast to 32 bit before doing the comparisons in
> glibc/sunrpc/pmap_rmt.c and glibc/sunrpc/clnt_tcp.c. I am currently
> recompiling the libc to see if it solves the problem.

This is the kind of thing I, too, suspect.  I am rebuilding the kernel
to verify that the correct time permits NFS mounts.  I believe that I
was steered astray by someone on the kernel list who said that Linux
said (are you getting the picture) that NFS was broken in 2.0.35.
Now, I believe that the 2.0.35 NFS is fine and  that this time problem
is what confounded me all along.

Builds take a LONG time on UDBs, so I suspect it will be days before
we have conclusive answers.

Thanks for the sincere effors.


Reply to: