[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)



On 11/14/22 13:48, hw wrote:
On Fri, 2022-11-11 at 21:55 -0800, David Christensen wrote:

Lots of snapshots slows down commands that involve snapshots (e.g.  'zfs
list -r -t snapshot ...').  This means sysadmin tasks take longer when
the pool has more snapshots.

Hm, how long does it take?  It's not like I'm planning on making hundreds of
snaphsots ...

2022-11-14 18:00:12 toor@f3 ~
# time zfs list -r -t snapshot bootpool | wc -l
      49

real	0m0.020s
user	0m0.011s
sys	0m0.012s

2022-11-14 18:00:55 toor@f3 ~
# time zfs list -r -t snapshot soho2_zroot | wc -l
     222

real	0m0.120s
user	0m0.041s
sys	0m0.082s

2022-11-14 18:01:18 toor@f3 ~
# time zfs list -r -t snapshot p3 | wc -l
    3864

real	0m0.649s
user	0m0.159s
sys	0m0.494s


I surprised myself -- I recall p3 taking 10+ seconds to list all the snapshots. But, I added another mirror since then, I try to destroy old snapshots periodically, and the machine has been up for 16+ days (so metadata is likely cached).


The Intel Optane Memory Series products are designed to be cache devices
-- when using compatible hardware, Windows, and Intel software.  My
hardware should be compatible (Dell PowerEdge T30), but I am unsure if
FreeBSD 12.3-R will see the motherboard NVMe slot or an installed Optane
Memory Series product.

Try it out?


Eventually, yes.


I thought Optane comes as very expensive PCI cards.  I don't have any m.2 slots,
and it seems difficult to even find mainboards with at least two that support
the same cards, which would be a requirement because there's no storing data
without redundancy.


I was thinking of getting an NVMe M.2 SSD to PCIe x4 adapter card for the machines without a motherboard M.2 slot.


# zpool status
   pool: moon
  state: ONLINE
config:

         NAME        STATE     READ WRITE CKSUM
         moon        ONLINE       0     0     0
           mirror-0  ONLINE       0     0     0
             sdc     ONLINE       0     0     0
             sdg     ONLINE       0     0     0
           raidz1-1  ONLINE       0     0     0
             sdl     ONLINE       0     0     0
             sdm     ONLINE       0     0     0
             sdn     ONLINE       0     0     0
             sdp     ONLINE       0     0     0
             sdq     ONLINE       0     0     0
             sdr     ONLINE       0     0     0
           raidz1-2  ONLINE       0     0     0
             sdd     ONLINE       0     0     0
             sde     ONLINE       0     0     0
             sdf     ONLINE       0     0     0
             sdh     ONLINE       0     0     0
             sdi     ONLINE       0     0     0
             sdj     ONLINE       0     0     0
           mirror-3  ONLINE       0     0     0
             sdk     ONLINE       0     0     0
             sdo     ONLINE       0     0     0


Some of the disks are 15 years old ...  It made sense to me to group the disks
by the ones that are the same (size and model) and use raidz or mirror depending
on how many disks there are.

I don't know if that's ideal.  Would zfs have it figured out by itself if I had
added all of the disks in a raidz?  With two groups of only two disks each that
might have wasted space?


So, 16 HDD's of various sizes?


Without knowing the interfaces, ports, and drives that correspond to devices sd[cdefghijklmnopqr], it is difficult to comment. I do find it surprising that you have two mirrors of 2 drives each and two raidz1's of 6 drives each.


If you want maximum server IOPS and bandwidth, layout your pool of 16 drives as 8 mirrors of 2 drives each. Try to match the sizes of the drives in each mirror. It is okay if the mirrors are not all the same size. ZFS will proportion writes to top-level vdev's based upon their available space. Reads come from whichever vdev's have the data.


When I built my latest server, I tried different pool layouts with 4 HDD's and ran benchmarks. (2 striped mirrors of 2 HDD's each was the winner.)


You can monitor pool I/O with:

# zpool iostat -v moon 10


On FreeBSD, top(1) includes ZFS ARC memory usage:

ARC: 8392M Total, 5201M MFU, 797M MRU, 3168K Anon, 197M Header, 2194M Other
     3529M Compressed, 7313M Uncompressed, 2.07:1 Ratio


Is the SSD cache even relevant for a backup server?


Yes, because the backup server is really a secondary server in a primary-secondary scheme. Both servers contain a complete set of data, backups, archives, and images. The primary server is up 24x7. I boot the secondary periodically and replicate. If the primary dies, I will swap roles and try to recover content that changed since the last replication.


I might have two unused
80GB SSDs I may be able to plug in to use as cache.


Split each SSD into two or more partitions. Add one partition on each SSD as a cache device for the HDD pool. Using another partition on each SSD, add a dedicated dedup mirror for the HDD pool.


I am thinking of using a third partition on each SSD to create a pool with one mirror of SSD partitions for a specific workload (CVS repository).


How does that work with destroying the oldest snapshot?  IIRC,
when a snapshot is removed (destroyed?  That's strange wording, "merge" seems
better ...), it's supposed to somehow merge with the data it has been created
from such that the "first data" becomes what the snapshot was unless the
snapshot is destroyed (that wording would make sense then), meaning it doesn't
exist anymore without merging and the "first data" is still there as it was.


Rather than giving a wrong explanation, please RTFM zfs(8) 'zfs destroy [-dnpRrv] snapshot[%snapname][,...]'.


I remember trying to do stuff with snapshots a long time ago and zfs would freak
out telling me that I can't merge a snapshot because there were other snapshots
that were getting in the way (as if I'd care, just figure it out yourself, darn
it, that's your job not mine ...) and it was a nightmare to get rid of those.


That sounds like clones. RTFM zfs(8) 'zfs destroy' and the -r and -R options.


David


Reply to: