[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Live Fille System Backup



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, May 07, 2017 at 10:53:51AM +1200, Ben Caradoc-Davies wrote:

[also a reply to Henrique, elsewhere in this thread]

> If a file is updated while it is being copied, it may contain only
> half a change set and be in an internally inconsistent state,
> perhaps making it unusable as a backup. Writes are typically not
> atomic. The same problem applies to collections of files that
> reference each other.
> 
> Kind regards,

Ben, Henrique -- no questions. This is what I subsumed under
"skew": application state may be dispersed across different
places in a file, across different files or even partly not
in files at all (e.g. in RAM: imagine a BTree with just parts
of its pointer structure not yet committed to disk).

Of course you can't ever win unless you collaborate with the
application in those cases (even with magic file systems like
ZFS or btrfs).

Then there is this subtle "file data" and "file metadata"
thing, which is an issue even with carefully designed applications
and file systems. It's even difficult to reach a consensus on
what is "right", remember the ext3/ext4 data loss episode[1]?

This is where shapshotting magic, be it built-in (zfs, btrfs)
or bolted-on (overlayfs, lvm) might help a bit: freeze a snapshot,
back up that (in the first case, the file systems provide a native
way to do that, in the second case, rsync is a pretty viable
way of doing things). I said "might help a bit" because the
ultimate consistency criterion is the application! A consistent
file system view might just be this truncated-to-zero file,
only the application "knows" at that point (e.g. by keeping
its data in an already unlinked file which is still open,
or somewhere in RAM, or...).

So your choices are

 - for the applications you really care about, look into
   what they are doing. Grown up apps will support you in
   that (I gave the PostgreSQL example above). Typically
   you can wrap the backup process in guards like ("keep
   your on-disk state consistent"[1]..."now you can relax").
   Note that to avoid races this structure is more or less
   necessary. The only real difference to the "magic
   snapshot" thing is that the latter happens very quickly.

 - for all the others... just relax.

Otherwise, "on line" backup is simply not an option.

cheers

[1] That doesn't mean necessarily frozen. PostgreSQL, for example,
   continues writing to the WAL, it just eats through its storage
   at a higher pace.

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlkO7SgACgkQBcgs9XrR2kYf9ACeM2njgrSttOUPRk4D6fJqJtjQ
qmkAn38VbkKiOlADe+33teN8uzcbLa2C
=uKNk
-----END PGP SIGNATURE-----


Reply to: