[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Disks renamed after update to 'testing'...?



On Thu, Aug 20, 2020 at 01:34:58PM -0700, David Christensen wrote:
> On 2020-08-20 08:32, rhkramer@gmail.com wrote:
> >On Thursday, August 20, 2020 03:43:55 AM tomas@tuxteam.de wrote:
> >>Contraty to the other (very valid) points, my backups are always on
> >>a LUKS drive, no partition table. Rationale is, should I lose it, the
> >>less visible information the better. Best if it looks like a broken
> >>USB stick. No partition table looks (nearly) broken :-)
> 
> I always use a partition table, to reduce the chance of confusing
> myself.  ;-)
> 
> 
> >I have two questions:
> >
> >    * I suppose that means you create the LUKS drive on, e.g., /dev/sdc rather
> >than, for example, /dev/sdc<n>?  (I suppose that should be easy to do.)

Exactly.

> >    * But, I'm wondering, how much bit rot would it take to make the entire
> >backup unusable, and what kind of precautions do you take (or could be taken)
> >to avoid that?

I have no current strategy for silent [1] bit rot. For file system
consistency, I do run from time to time an fsck after opening the
LUKS and before mounting (We are talking about roughly 60..70 GB;
were we talking about 100..1000 times as much, active bitrot
mitigation might sound more compelling).

> I have been pondering bit-rot mitigation on non-checksumming filesystems.

The big ones have that; and for really huge amounts of data (where
some corners of your data might rest unseen for years (say hundreds
of TB or so), it does make sense.

In my case, I consider the backup just as something which i expect
to "fail early" and "fail loudly". In the "normal" case it is perfectly
disposable :-)

> Some people have mentioned md RAID.  tomas has mentioned LUKS.  I
> believe both of them add checksums to the contained contents.  So,
> bit-rot within a container should be caught by the container driver.

Don't know about that, to be honest: I count on the ext4 beneath the
LUKS to catch any nasties (and to issue an early warning when the
USB stick starts degrading -- I'm still a bit queasy how cheap a
128GB USB stick can be).

> In the case of md RAID, the driver should respond by fetching the
> data from another drive and then dealing with the bad block(s); the
> application should not see any error (?).  I assume LVM RAID would
> respond like md RAID (?).

Yes. That's why I reserve RAID for "high availability" case: you
want to keep running after a failure, and your customer doesn't
notice (it would make sense to think about whether this is the
best level to introduce redundancy, but I disgress).

>   In the case of LUKS, the driver has no
> redundant data (?) and will have no choice but to report an error to
> the application (?).  I would guess LVM non-RAID would behave
> similarly (?).

Exactly. For the backup scenario, the whole backup /is/ the redundant
data. If the probability of failure of your main system in some
given time interval T is, say, 10e-7, and that of your backup's in
the same time interval is, say 10e-5 (cheaper hardware, and that),
you're looking into a catastrophe with a prob of 10e-12. If you
want to better that, use two separate backup media, then you are
into 10e-17 [2].

Cheers
[1] silent meaning some bit flips in file content, without the
   file system noticing. Which on ext4 is quite possible and
   btrfs, e.g. can (reasonably) guard against.

[2] This is, of course, "economist maths", the one which lead
   to the 2008-2009 crash: assume all those bad events are
   independent. If my house burns down, my computer is in there,
   and my only backup on a stick is in my pocket...

 - t
> 
> 
> For all three -- md, LUKS, LVM -- I don't know what happens for bit
> rot outside the container (e.g. in the container metadata).
> 
> 
> David
> 

Attachment: signature.asc
Description: Digital signature


Reply to: