[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Install by NFS fails (while network works)

Stephane Bortzmeyer writes:
 > [No emergency, I finally installed from a CD and it worked.]
 > Trying to install the base system with NFS, I get:
 > mount RPC: timed out
 > when I try to mount the disk from the shell (the installation program just 
 > says there was an error).
 > What is odd is that the network works (I can ping) and the the mount 
 > apparently succeeded (here is what the NFS server says):

Bonjour Stephane,

This  problem also occured to Oscar Levi recently, he suspected
it was because the clocks of the nfs client and server was not

Well trying to reproduce it, I tend to confirm this is related to a
wrong wallclock setting, but in a very subtle way (both RPC and NFS
normally make no assumption about clock synchronisation between client and
Here is what I suspect:
- due to current bad handling of Alpha CMOS, we sometimes end up in
year 1930, or some other year before 1970 which is represented by a
negative integer.  
- RPC use the seconds returned by gettimeofday to construct an id
- this id is generally stored in long variable, but goes through a ton 
of htonl, ntohl, 32bit cast, 64 bit cast, to the peer and back (while
on the network it is 32 bit), and at the end it is used to do the
matching with the peer replies. It generally works well  
because the id will generally be a positive value that fit into 31
bit. But having a negative  time will cause a negative id to be
generated, which cause undefined behaviour on the  upper 32 bit with
ntohl (the behaviour is not the same when you compile with or without
optimisation). So the final comparison should ignore the upper 32bit,
which is not done currently.
- this cause RPC based programs (and so mountd) to ignore any replies
from a server, and then to timeout. 
- I have tried to cast to 32 bit before doing the comparisons in
glibc/sunrpc/pmap_rmt.c and glibc/sunrpc/clnt_tcp.c. I am currently
recompiling the libc to see if it solves the problem.



Reply to: