[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)



On 11/11/22 00:43, hw wrote:
On Thu, 2022-11-10 at 21:14 -0800, David Christensen wrote:
On 11/10/22 07:44, hw wrote:
On Wed, 2022-11-09 at 21:36 -0800, David Christensen wrote:
On 11/9/22 00:24, hw wrote:
   > On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:

Taking snapshots is fast and easy.  The challenge is deciding when to
destroy them.

That seems like an easy decision, just keep as many as you can and destroy the
ones you can't keep.


As with most filesystems, performance of ZFS drops dramatically as you approach 100% usage. So, you need a data destruction policy that keeps storage usage and performance at acceptable levels.


Lots of snapshots slows down commands that involve snapshots (e.g. 'zfs list -r -t snapshot ...'). This means sysadmin tasks take longer when the pool has more snapshots.


I have considered switching to one Intel Optane Memory
Series and a PCIe 4x adapter card in each server [for a ZFS cache].

Isn't that very expensinve and wears out just as well?


The Intel Optane Memory Series products are designed to be cache devices -- when using compatible hardware, Windows, and Intel software. My hardware should be compatible (Dell PowerEdge T30), but I am unsure if FreeBSD 12.3-R will see the motherboard NVMe slot or an installed Optane Memory Series product.


Intel Optane Memory M10 16 GB PCIe M.2 80mm are US $18.25 on Amazon.


Intel Optane Memory M.2 2280 32GB PCIe NVMe 3.0 x2 are US $69.95 on Amazon.


Wouldn't it be better to have the cache in RAM?


Adding memory should help in more ways than one. Doing so might reduce ZFS cache device usage, but I am not certain. But, more RAM will not address the excessive wear problems when using a desktop SSD as a ZFS cache device.


8 GB ECC memory modules to match the existing modules in my SOHO server are $24.95 each on eBay. I have two free memory slots.


Please run and post the relevant command for LVM, btrfs, whatever.

Well, what would that tell you?


That would provide accurate information about the storage configuration of your backup server.


Here is the pool in my backup server. mirror-0 and mirror-1 each use two Seagate 3 TB HDD's. dedup and cache each use partitions on two Intel SSD 520 Series 180 GB SSD's:

2022-11-11 20:41:09 toor@f1 ~
# zpool status p1
  pool: p1
 state: ONLINE
scan: scrub repaired 0 in 7 days 22:18:11 with 0 errors on Sun Sep 4 14:18:21 2022
config:

	NAME                              STATE     READ WRITE CKSUM
	p1                                ONLINE       0     0     0
	  mirror-0                        ONLINE       0     0     0
	    gpt/p1a.eli                   ONLINE       0     0     0
	    gpt/p1b.eli                   ONLINE       0     0     0
	  mirror-1                        ONLINE       0     0     0
	    gpt/p1c.eli                   ONLINE       0     0     0
	    gpt/p1d.eli                   ONLINE       0     0     0
	dedup	
	  mirror-2                        ONLINE       0     0     0
	    gpt/CVCV******D0180EGN-2.eli  ONLINE       0     0     0
	    gpt/CVCV******7K180EGN-2.eli  ONLINE       0     0     0
	cache
	  gpt/CVCV******D0180EGN-1.eli    ONLINE       0     0     0
	  gpt/CVCV******7K180EGN-1.eli    ONLINE       0     0     0

errors: No known data errors


I suggest creating a ZFS pool with a mirror vdev of two HDD's.
   If you
can get past your dislike of SSD's,
  add a mirror of two SSD's as a
dedicated dedup vdev.  (These will not see the hard usage that cache
devices get.)
   Create a filesystem 'backup'.  Create child filesystems,
one for each host.  Create grandchild filesystems, one for the root
filesystem on each host.

Huh?  What's with these relationships?


ZFS datasets can be organized into hierarchies. Child dataset properties can be inherited from the parent dataset. Commands can be applied to an entire hierarchy by specifying the top dataset and using a "recursive" option. Etc..


When a host is decommissioned and you no longer need the backups, you can destroy the backups for just that host. When you add a new host, you can create filesystems for just that host. You can use different backup procedures for different hosts. Etc..


   Set up daily rsync backups of the root
filesystems on the various hosts to the ZFS grandchild filesystems.  Set
up zfs-auto-snapshot to take daily snapshots of everything, and retain
10 snapshots.  Then watch what happens.

What do you expect to happen?


I expect the first full backup and snapshot will use an amount of storage that is something less than the sum of the sizes of the source filesystems (due to compression). The second through tenth backups and snapshots will each increase the storage usage by something less than the sum of the daily churn of the source filesystems. On day 11, and every day thereafter, the oldest snapshot will be destroyed, daily churn will be added, and usage will stabilize. Any source system upgrades and software installs will cause an immediate backup storage usage increase. Any source system cleanings and software removals will cause a backup storage usage decrease after 10 days.


I'm thinking about changing my backup sever ...
In any case, I need to do more homework first.


Keep your existing backup server and procedures operational. If you do not have offline copies of your backups (e.g. drives in removable racks, external drives), implement that now.


Then work on ZFS. ZFS looks simple enough going in, but you soon realize that ZFS has a large feature set, new concepts, and a non-trivial learning curve. Incantations get long and repetitive; you will want to script common tasks. Expect to make mistakes. It would be wise to do your ZFS evaluation in a VM. Using a VM would also allow you to use any OS supported by the hypervisor (which may work-around the problem of FreeBSD not having drivers for the HP smart array P410).


David


Reply to: