Re: NFS Failover
On 06/26/2013 09:11 PM, David Parker wrote:
> Hello,
> 
> I'm wondering if there is a way to set up a highly-available NFS share
> using two servers (Debian Wheezy), where the shared volume can failover
> if the primary server goes down.  My idea was to use two NFS servers and
> keep the exported directories in sync using DRDB.  On the client, mount
> the share via autofs using the "Replicated Server" syntax.
> 
> For example, say I have two servers called server1 and server2, each of
> which is exporting the directory /export/data via NFS, and /export/data
> is a synced DRDB filesystem shared between them.    On the client, set
> up an autofs map file to mount the share and add this line:
> 
>     /mnt/data    server1,server2:/export/data
> 
> This is close, but it doesn't do what I'm looking to do.  This seems to
> round-robin between the two servers whenever the filesystem needs to be
> mounted, and if the selected server isn't available, it then tries the
> other one.
> 
> What I'm looking for is a way to have the client be aware of both
> servers, and gracefully failover between them.  I thought about using
> Pacemaker and Corosync to provide a virtual IP which floats between the
> servers, but would that work with NFS?  Let's say I have an established
> NFS mount and server1 fails, and the virtual IP fails over to server2.
>  Wouldn't there be a bunch of NFS socket and state information which
> server2 is unaware of, therefore rendering the connection useless on the
> client?  Also, data integrity is essential in this scenario, so what
> about active writes to the NFS share which are happening at the time the
> server-side failover takes place?
> 
> In full disclosure, I have tried the autofs method but not the
> Pacemaker/Corosyn HA method, so some experimentation might answer my
> questions.  In the meantime, any help would be greatly appreciated.
> 
>     Thanks!
>     Dave
I have also studied NFS fail-over with Pacemaker/Corosync/DRBD and it
could work with NFSv3; NFSv4 uses TCP which makes things very hard. But
even with NFSv3 I stumbled over strange situations, the likes of which I
don't really remember, but the bottom line I have decided that NFS NFS
fail-over is too fiddly and hard to control reliably. Now I'm studying
using Gluster for replicating data between nodes and mounting the
gluster volumes on the clients via glusterfs - this seems like a much
better, simpler and more robust approach. I suggest you take a look at
Gluster, it's an exceptionally good technology.
-- 
Adrian Fita
Reply to: