[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reinstallation and restore after catastrophic mistake or failure; was: 1 Currently unreadable (pending) sectors How worried should I be?



On 6 Jan 2024 00:37 -0800, from dpchrist@holgerdanske.com (David Christensen):
> I suggest taking an image (backup) with dd(1), Clonezilla, etc., when you're
> done.  This will allow you to restore the image later -- to roll-back a
> change you do not like, to recovery from a disaster, to clone the image to
> another device, to facilitate experiments, (such as doing a secure erase to
> see if it resolves the SSD pending sector issue), etc..
> 
> If you also keep your system configuration files in a version control
> system, restoring an image is faster than wipe/ fresh install/ configure/
> restore data.

I would go even farther. Backups should be designed such that
recovering from a catastrophic storage failure, such as getting hit by
ransomware, unintentionally doing a destructive badblocks write test
or the sudden failure of a storage device, is possible by at most
something very similar to:

* Boot some kind of live environment
* Set up file systems on the storage device to be restored onto
  (partitioning, setting up LUKS containers, formatting, whatever else
  might be called for)
* Within the live environment, install and configure the software
  needed to access the backup (if any) (this may include things like
  cryptographic keys, access passphrases and the likes)
* Perform the restoration from the most recent backup (this is the
  part that likely will take a significant amount of time)
* Update the restored copies of /etc/fstab, /etc/crypttab and any
  other files that directly reference the partitions or file systems
  by some kind of ID (UUID, /dev/disk/by-*/*, ...)
* Reinstall the boot loader
* Reboot
* Reinstall the boot loader again from within the restored environment
  to ensure that everything relating to it is in sync

Such recovery should _not_ need to involve significant reconfiguration
of anything. Any such requirements will massively increase your time
to recovery, as I think we're seeing an example of here. And yes,
pretty much all of this could be scripted, but I strongly suspect that
few people need to do a bare-metal restore of their most recent backup
often enough for _that_ to be worth the effort to create and maintain.

Which is not to say that keeping configuration files
version-controlled cannot provide benefits anyway; but given a proper,
frequent backup regime, the benefits even of that are reduced.

-- 
Michael Kjörling                     🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”


Reply to: