[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sshfs: problem with rsync



On 2015-04-07 22:34:32 +0200, Pierre Frenkiel wrote:
> On Tue, 7 Apr 2015, Vincent Lefevre wrote:
> >With sshfs, you don't need a remote rsync because the rsync
> >synchronization is entirely done on the local side (sshfs does
> >the additional transfer to the remote side), but unless sshfs
> >has rsync like optimizations (I doubt), it will be much slower
> >because it will have to copy the whole files and not just the
> >changes. I've never compared, though.
> 
>   I thought that copying to a mounted file system is equivalent to copying
>   on a local one, so that rsync worked the same way in both cases.
>   I'll check that when I have some time.

Let's take an example. Suppose that, after a rsync, you have a big
file named "bigfile", and you add a character at the beginning of
this file on the local side. When you use rsync, it will detect
that you just added a character so that it will transfer very few
data to the server rsync.

What the server rsync must do here is something like:
  1. Create a new file "bigfile.tmp" (I don't know the exact filename,
     it doesn't matter) containing the contents of the new file.
  2. Remove "bigfile".
  3. Rename "bigfile.tmp" to "bigfile".
The reason is that there's no system call to shift data in a file.
Hence the need for a temporary file[*]. So, you'll have much disk
activity on the remote side. But the point is that between the
client rsync and the server rsync, very few data are transferred.

[*] One could also overwrite "bigfile", with the drawback of possible
data loss in case of failure. But the performance behavior would be
the same anyway.

Now, consider your case where the server rsync is also on the local
side, but the transfer to the remote machine is done via sshfs. So,
the sshfs will see the following operations:
  1. Create a new file "bigfile.tmp" (I don't know the exact filename,
     it doesn't matter) containing the contents of the new file.
  2. Remove "bigfile".
  3. Rename "bigfile.tmp" to "bigfile".
and will propagate these operations to the remote machine. So, unless
there's something I'm missing (specific optimization, but I don't see
how), many data will be transferred due to Step 1.

Note: if you do some test, make sure that "bigfile" has data that
cannot be compressed (or very little).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: