[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)



On Wed, 2022-11-09 at 21:36 -0800, David Christensen wrote:
> On 11/9/22 00:24, hw wrote:
>  > On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:
> 
>  > Hmm, when you can backup like 3.5TB with that, maybe I should put 
> FreeBSD on my
>  > server and give ZFS a try.  Worst thing that can happen is that it 
> crashes and
>  > I'd have made an experiment that wasn't successful.  Best thing, I 
> guess, could
>  > be that it works and backups are way faster because the server 
> doesn't have to
>  > actually write so much data because it gets deduplicated and reading 
> from the
>  > clients is faster than writing to the server.
> 
> 
> Be careful that you do not confuse a ~33 GiB full backup set, and 78 
> snapshots over six months of that same full backup set, with a full 
> backup of 3.5 TiB of data.  I would suggest a 10 GiB pool to backup the 
> latter.

The full backup isn't deduplicated?

> Writing to a ZFS filesystem with deduplication is much slower than 
> simply writing to, say, an ext4 filesystem -- because ZFS has to hash 
> every incoming block and see if it matches the hash of any existing 
> block in the destination pool.  Storing the existing block hashes in a 
> dedicated dedup virtual device will expedite this process.

But when it needs to write almost nothing because almost everthing gets
deduplicated, can't it be faster than having to write everthing?

>  >> I run my backup script each night.  It uses rsync to copy files and
>  >
>  > Aww, I can't really do that because my servers eats like 200-300W 
> because it has
>  > so many disks in it.  Electricity is outrageously expensive here.
> 
> 
> Perhaps platinum rated power supplies?  Energy efficient HDD's/ SSD's?

If you pay for it ... :)

Running it once in a while for some hours to make backups is still possible. 
Replacing the hardware is way more expensive.

> [...]
>  > Sounds like a nice setup.  Does that mean you use snapshots to keep 
> multiple
>  > generations of backups and make backups by overwriting everything 
> after you made
>  > a snapshot?
> 
> Yes.

I start thinking more and more that I should make use of snapshots.

>  > In that case, is deduplication that important/worthwhile?  You're not
>  > duplicating it all by writing another generation of the backup but 
> store only
>  > what's different through making use of the snapshots.
> 
> Without deduplication or compression, my backup set and 78 snapshots 
> would require 3.5 TiB of storage.  With deduplication and compression, 
> they require 86 GiB of storage.

Wow that's quite a difference!  What makes this difference, the compression or
the deduplication?  When you have snapshots, you would store only the
differences from one snapshot to the next, and that would mean that there aren't
so many duplicates that could be deduplicated.

>  > ... I only never got around to figure [ZFS snapshots] out because I 
> didn't have the need.
> 
> 
> I accidentally trash files on occasion.  Being able to restore them 
> quickly and easily with a cp(1), scp(1), etc., is a killer feature.

indeed

> Users can recover their own files without needing help from a system 
> administrator.

You have users who know how to get files out of snapshots?

>  > But it could also be useful for "little" things like taking a 
> snapshot of the
>  > root volume before updating or changing some configuration and being 
> able to
>  > easily to undo that.
> 
> 
> FreeBSD with ZFS-on-root has a killer feature called "Boot Environments" 
> that has taken that idea to the next level:
> 
> https://klarasystems.com/articles/managing-boot-environments/

That's really cool.  Linux is missing out on a lot by treating ZFS as an alien.

I guess btrfs could, in theory, make something like boot environments possible,
but you can't even really boot from btrfs because it'll fail to boot as soon as
the boot volume is degraded, like when a disc has failed, and then you're
screwed because you can't log in through ssh to fix anything but have to
actually go to the machine to get it back up.  That's a non-option and you have
to use something else than btrfs to boot from.

>  >> I have 3.5 TiB of backups.
> 
> 
> It is useful to group files with similar characteristics (size, 
> workload, compressibility, duplicates, backup strategy, etc.) into 
> specific ZFS filesystems (or filesystem trees).  You can then adjust ZFS 
> properties and backup strategies to match.

That's a good idea.

>  >>>> For compressed and/or encrypted archives, image, etc., I do not use
>  >>>> compression or de-duplication
>  >>>
>  >>> Yeah, they wouldn't compress.  Why no deduplication?
>  >>
>  >>
>  >> Because I very much doubt that there will be duplicate blocks in 
> such files.
>  >
>  > Hm, would it hurt?
> 
> 
> Yes.  ZFS deduplication is resource intensive.

But you're using it already.

>  > Oh it's not about performance when degraded, but about performance. 
> IIRC when
>  > you have a ZFS pool that uses the equivalent of RAID5, you're still 
> limited to
>  > the speed of a single disk.  When you have a mysql database on such a ZFS
>  > volume, it's dead slow, and removing the SSD cache when the SSDs 
> failed didn't
>  > make it any slower.  Obviously, it was a bad idea to put the database 
> there, and
>  > I wouldn't do again when I can avoid it.  I also had my data on such 
> a volume
>  > and I found that the performance with 6 disks left much to desire.
> 
> 
> What were the makes and models of the 6 disks?  Of the SSD's?  If you 
> have a 'zpool status' console session from then, please post it.

They were (and still are) 6x4TB WD Red (though one or two have failed over time)
and two Samsung 850 PRO, IIRC.  I don't have an old session anymore.

These WD Red are slow to begin with.  IIRC, both SDDs failed and I removed them.

The other instance didn't use SSDs but 6x2TB HGST Ultrastar.  Those aren't
exactly slow but ZFS is slow.

> Constructing a ZFS pool to match the workload is not easy.

Well, back then there wasn't much information because ZFS was a pretty new
thing.

>   STFW there 
> are plenty of articles.  Here is a general article I found recently:
> 
> https://klarasystems.com/articles/choosing-the-right-zfs-pool-layout/

Thanks!  If I make a zpool for backups (or anything else), I need to do some
reading beforehand anyway.

> MySQL appears to have the ability to use raw disks.  Tuned correctly, 
> this should give the best results:
> 
> https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tablespace.html#innodb-raw-devices

Could mysql 5.6 already do that?  I'll have to see if mariadb can do that now
...

> If ZFS performance is not up to your expectations, and there are no 
> hardware problems, next steps include benchmarking, tuning, and/or 
> adding or adjusting the hardware and its usage.

In theory, yes :)

I'm very reluctant to mess with the default settings of file systems.  When xfs
became available for Linux some time in 90ies, I managed to loose data when an
xfs file system got messed up.  Fortunately, I was able to recover almost all
from backups and from the file system.  I never really found out what caused it,
but long time later I figured that I probably hadn't used mounting options I
should have used.  I had messed with the defaults for some reason I don't
remember.  That tought me a lesson.

>  >> ... invest in hardware to get performance.
> 
>  > Hardware like?
> 
> 
> Server chassis, motherboards, chipsets, processors, memory, disk host 
> bus adapters, disk racks, disk drives, network interface cards, etc..

Well, who's gona pay for that?

>  > In theory, using SSDs for cache with ZFS should improve
>  > performance.  In practise, it only wore out the SSDs after a while, 
> and now it's
>  > not any faster without SSD cache.
> 
> 
> Please run 'zpool status' and post the console session (prompt, command 
> entered, output displayed).  Please correlate the vdev's to disk drive 
> makes and models.

See above ... The pool is a raidz1-0 with the 6x4TB Red drives, and no SSDs are
left.

> On 11/9/22 03:41, hw wrote:
> 
> > I don't have anything without ECC RAM, 
> 
> 
> Nice.

Yes :)  Buying used has it's advantages.  You don't get the fastest, but you get
tons of ECC RAM and awesome CPUs and reliability.

> > and my server was never meant for ZFS.
> 
> 
> What is the make and model of your server?

I put it together myself.  The backup server uses a MSI mainboard with the
designation S0121 C204 SKU in a Chenbro case that has a 16xLFF backplane.  It
has only 16GB RAM and would max out at 32GB.  Unless you want ZFS with
deduplication, that's more than enough to make backups :)

I could replace it with a Dell r720 to get more RAM, but those can have only
12xLFF.  I could buy a new Tyan S7012 WGM4NR for EUR 50 before they're sold out
and stuff at least 48GB RAM into it and 2x5690 Xeons (which are supposed to go
into a Z800 I have sitting around and could try to sell, but I'm lazy), but then
I'd probably have to buy CPU coolers for it (I'm not sure the coolers of the
Z800 fit) and a new UPS because it would need so much power.  (I also have the
48GB because they came in a server I bought for the price of the 5690s (to get
the 5690s) and another 48GB in the Z800, but not all of it might fit ...)

It would be fun, but I don't really feel like throwing money at technology that
old just for making a backup once in a while.  If you can tell me that the
coolers of the Z800 definitely fit the Tyan board, I'll buy one and throw it
into my server.  It would be worth spending the EUR 50.  Hm, maybe I should find
out, but that'll be difficult ... and the fan connctors won't fit even if the
coolers do.  They're 4 pin. ... Ok, I could replace the fans, but I don't have
any 90mm fans.  Those shouldn't cost too much, though.

> > With mirroring, I could fit only one backup, not two.
> 
> 
> Add another mirror to your pool.  Or, use a process of substitution and 
> resilvering to replace existing drives with larger capacity drives.

Lol, I can't create a pool in thin air.  Wouldn't it be great if ZFS could do
that? :)  Use ambient air for storage ... just make sure the air doesn't escape
;)

There's nothing to resilver, the backup server is currently using btrfs.

Have you checked disc prices recently?  Maybe I'll get lucky on black friday,
but if I get some, they'll go into my active server.

> > In any case, I'm currently tending to think that putting FreeBSD with ZFS on
> > my
> > server might be the best option.  But then, apparently I won't be able to
> > configure the controller cards, so that won't really work.
> 
> 
> What is the make and model of your controller cards?

They're HP smart array P410.  FreeBSD doesn't seem to support those.

> [...]
> I have a Debian VM and no contrib ... hm, zfs-dkms and such?  That's 
> promising,
> 
> 
> +1
> 
> https://packages.debian.org/bullseye/zfs-dkms
> 
> https://packages.debian.org/bullseye/zfsutils-linux

yeah

> [...]
> If you already have a ZFS pool, the way to back it up is to replicate 
> the pool to another pool.  Set up an external drive with a pool and 
> replicate your server pool to that periodically.

No, the data to back up is mostly (or even all) on btrfs.  IIRC, btrfs has some
sending feature, but I rather don't do anything complicated and just copy the
files over with rsync.  It's not like I could replicate some volume/pool because
the data comes from different machines and all backs up to one volume.


Reply to: