[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Backup Times on a Linux desktop



On 04/11/19 20:43, deloptes wrote:
Alessandro Baggi wrote:

If I'm not wrong deduplication "is a technique for eliminating duplicate
copies of repeating data".

I'm not a borg expert and it performs deduplication on data chunk.

Suppose that you backup 2000 files in a day and inside this backup a
chunk is deduped and referenced by 300 files. If the deduped chunk is
broken I think you will lost it on 300 referenced files/chunks. This is
not good for me.


Look at the explanation by Linux-Fan. I think it is pretty good. It fits one
scenario, however if your backup system (disks or whatever) is broken - it
can not be considered as backup system at all.


Linux-Fan reply is interesting but there is not nothing new for me.

I think deduplication is a great thing nowdays - People need to backup TBs,
take care of retention etc. I do not share your concerns at all.

if your main dataset has a broken file, no problem, you can recovery
from backups.

If your saved deduped chunk is broken all files that has reference to it
could be broken. I think also that the same chunk will be used for
successive backups (always for deduplication) so this single chunk could
be used from backup1 to backupN.


This is not true.


What is not true?
The same single chunk will not be used inside other backups? So dedup chunk is related only to one backup?


It has also integrity check but don't know if check this. I read also
that integrity check on bigsized dataset could require too much time.

In my mind a backup is a copy of file in window time and if needed in
another window time another copy could be picked but it could not be a
reference to a previous copy. Today there are people that make backups
on tape (expensive) for reliability. I run backups on disks. Disks are
cheap so compression (that require time in backup and restore) and
deduplication (that add complexity) are not needed for me and they don't
affect really my free disk space because I can add a disk.


I think it depends how far you want to go - how precious is the data.
Magnetic disk and tapes can be destroyed by EMP or similar. SSD despite its
price can fail and if it fails - it can not recover anything.
So ... there are some rules in securely preserving backups - but all of this
is very expensive.


EMP or similar? You are right but I have seen only one case in my experience where a similar event broken a memory and was a laptop disk near a radar station. How many times could happen this?

Rsnapshot uses hardlink that is similar.

All this solutions are valid if them fit your needs. You must choose how
important are data inside your backups and if losing a chunk deduped
could make damage to your backup dataset in a timeline.


No unless the corruption is on the backup server, but if it happens ... well
you should consider the backup server broken - I do not think it has
anything with deduplication.

Ah if you have multiple server to backup, I prefer bacula because can
pull data from hosts and can backup multiple server from the same point
(maybe using for each client a separated bacula-sd daemon with dedicated
storage).


.



Reply to: