[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Home made backup system



On Thu, Dec 26, 2019 at 09:51:59AM -0500, rhkramer@gmail.com wrote:
> Just to confirm, I assume that is true ("no way to skip ahead to byte 31337") 
> even if the underlying media is a (somewhat random access) disk instead of 
> (serial access) tape?

Correct.  There's no central index inside the tar archive that says
"file xyz begins at byte 12345".  This is by design, so that you can
append new content to an existing tar archive.  When you append a new
file to an existing archive, you simply drop a new metadata header
record, and then the new content.  So, the entire archive is a long
string of

header file header file header file ....

The only way to find a file is to read the entire thing from the beginning
until you find the file you want.

> Again, I assume (I know what assume does) that "USB mass-storage device that 
> acts like a hard drive" is (or might be) a pen drive type of device.

Yes.

> I've had 
> a lot of bad luck (well, more bad luck than I'd like) with that kind of 
> device, and I suspect that the problem is more likely to occur when parts of 
> the device are erased to allow something new to be written to it.
> 
> In other words, I suspect it would be more reliable if it functioned a little 
> bit more like a WORM (Write Once, Read Many) type device

"Write Once, Read Many" is an entirely different data storage paradigm.
Think of a large dusty vault full of optical media.  Once you've backed up
your full database (or whatever) to one of these media, it goes into
the vault.  You can't reuse the medium, nor do you WANT to, for legal
reasons.  You've chosen this technology specifically because it CANNOT
be altered once written, and therefore gives you some sort of debatably
reliable legal trail of evidence.  "On May 7th, this is what we had."

Very expensive, and very niche.

> -- not that the whole 
> device necessarily has to be written in one go, but more that, for highest 
> reliablity,  data is appended by  writing in previously unused locations 
> rather than deleting some data, and then writing new data in previously used 
> and erased locations.

I am not an expert in solid state storage, so I won't even try to
address the questions about long-term reliability of various USB mass
storage devices.

For most people, it comes down to "when you can't write to the device
any more, you throw it away and get another".

> I don't know whether rsync, in the normal course of events will delete (erase) 
> and write data in previously used locations, but it would be helpful to have 
> comments, with respect to:
> 
>    * whether rsync will rewrite to previously used locations, [...]

Rsync does not operate at the disk sector level.  It operates at the
file level.  If you've modified a file since the last backup, then rsync
knows it needs to modify the backed-up copy of the file.  It will use
various algorithms to decide whether it should just copy the entire
file from the source, or try to preserve pieces of the file that are
already on the destination.

The main goal there is to reduce the transmission of bytes from a
source host to a destination host, because one of rsync's main use
cases is backing up files across a network.

Since you're focusing on the case where there's no network involved,
a lot of that work is just not relevant.  In the end, as far as I
understand it, rsync will create a new file on the destination, which
contains the new content (however it gets the new content).  Then the
older copy of the file will be deleted.

How the storage device's controller works (how it decides which parts
of the device get the new file, how the part where the old file used to
be get recycled, etc.) is outside of rsync's purview, and definitely
outside of *my* personal knowledge.


Reply to: