[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Backup Times on a Linux desktop



Alessandro Baggi wrote:

> If I'm not wrong deduplication "is a technique for eliminating duplicate
> copies of repeating data".
> 
> I'm not a borg expert and it performs deduplication on data chunk.
> 
> Suppose that you backup 2000 files in a day and inside this backup a
> chunk is deduped and referenced by 300 files. If the deduped chunk is
> broken I think you will lost it on 300 referenced files/chunks. This is
> not good for me.
> 

Look at the explanation by Linux-Fan. I think it is pretty good. It fits one
scenario, however if your backup system (disks or whatever) is broken - it
can not be considered as backup system at all.

I think deduplication is a great thing nowdays - People need to backup TBs,
take care of retention etc. I do not share your concerns at all.

> if your main dataset has a broken file, no problem, you can recovery
> from backups.
> 
> If your saved deduped chunk is broken all files that has reference to it
> could be broken. I think also that the same chunk will be used for
> successive backups (always for deduplication) so this single chunk could
> be used from backup1 to backupN.
> 

This is not true.

> It has also integrity check but don't know if check this. I read also
> that integrity check on bigsized dataset could require too much time.
> 
> In my mind a backup is a copy of file in window time and if needed in
> another window time another copy could be picked but it could not be a
> reference to a previous copy. Today there are people that make backups
> on tape (expensive) for reliability. I run backups on disks. Disks are
> cheap so compression (that require time in backup and restore) and
> deduplication (that add complexity) are not needed for me and they don't
> affect really my free disk space because I can add a disk.
> 

I think it depends how far you want to go - how precious is the data.
Magnetic disk and tapes can be destroyed by EMP or similar. SSD despite its
price can fail and if it fails - it can not recover anything.
So ... there are some rules in securely preserving backups - but all of this
is very expensive.

> Rsnapshot uses hardlink that is similar.
> 
> All this solutions are valid if them fit your needs. You must choose how
> important are data inside your backups and if losing a chunk deduped
> could make damage to your backup dataset in a timeline.
> 

No unless the corruption is on the backup server, but if it happens ... well
you should consider the backup server broken - I do not think it has
anything with deduplication.

> Ah if you have multiple server to backup, I prefer bacula because can
> pull data from hosts and can backup multiple server from the same point
> (maybe using for each client a separated bacula-sd daemon with dedicated
> storage).



Reply to: