[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Homebuilt NAS Advice



Leslie Rhorer wrote:

> In this context what, exactly, is de-duplication?  I fail to see how
> any meaningful interpretation of the term is salient to backups.  To
> compression, yes, to symbolic interpretation, surely, and to saving
> space on a drive and reducing access times, you bet.  To backups?  I
> don't really see it, unless you mean hard-link handling, which it does
> most admirably.  Soft links, of course, are fairly straightforward.  DAR
> does handle sparse files exceedingly well.
> 

Imagine you have classical backup: daily incrementals, full weekly and full
monthly. Imagine you have retention for the full weekly 3 (until end of
month) and for full monthly 12 (until end of year).
You have to maintain 15 full backups and the 6 daily incrementals. How much
space is it, that you need for your backup storage?
This is why the question what is your active size. No imagine I have 2TB of
data, even if I compress this data - lets say with avg. 60% ratio it is
800GB per full backup. 15 copies + means 12TB+. Of course if you have
video/audio like mp3, it is already compressed and ratio for the backup
compression goes down and space needed up.
Now here comes the trick with deduplication. The backup system makes one
full backup (800GB) and then keeps track of the bits that changed (it is
not that simple, but for the example). Only they are being backuped. Some
systems provide ratio of 90%. So to keep your 15+ copies with deduplication
ratio of ~80% you need about 3TB.



>> May I ask what is your active disk size
> 
> What do you mean by "active" disk size?  In each of my main arrays
> there are 8 spindles of 8 Terabytes each.  Six spindles worth are
> encoded with flat data and 2 spindles worth with parity.  RAID 6 does
> not assign any disks specifically for data or for parity as RAID 3 and
> RAID 4 do.  Instead, with both RAID 5 and RAID 6, parity is distributed
> across every drive, and the data is also distributed across all the
> drives, interleaved with the parity.  All put together, the available
> volume size is 46.9 Terabytes (43.6 Teribytes) after formatting.  The
> main server currently has 22 Terabytes of data on it.  The backup server
> is effectively full.
> 

So you have a perfect candidate for deduplication :) because I guess you can
keep only few copies of that size on the backup server.

Live example here one of the servers with borg

the backup archive

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated
size
All archives:                5.47 TB              3.37 TB            483.58
GB

                       Unique chunks         Total chunks
Chunk index:                 2342114             23644792

The last monthly

Archive name: 2020-07-04T22:01:21
Archive fingerprint: xxxxxxxxxxxxxxxxx
Comment:
Hostname: xxxx
Username: xxxx
Time (start): Sat, 2020-07-04 22:01:32
Time (end): Sat, 2020-07-04 23:28:51
Duration: 1 hours 27 minutes 19.92 seconds
Number of files: 3416089
Command line: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Utilization of maximum supported archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated
size
This archive:              807.19 GB            493.13 GB             14.90
GB
All archives:                5.47 TB              3.37 TB            483.58
GB

                       Unique chunks         Total chunks
Chunk index:                 2342114             23644792





Reply to: