[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: rsync --delete vs rsync --delete-after



On Fri, 2024-01-26 at 16:11 +0100, hw wrote:
> I've never had issues with any UPS due to self tests.  The batteries
> need to be replaced when they are worn out.  How often that is
> required depends on the UPS and the conditions it is working in,
> usually every 3--5 years.

It was with some small to mid APC model, I think. We had about 1 to 2kW
worth of servers on it, so it was not that small, definitely no
consumer type. When I took over maintenance somebody had configured
some sort of weekly or biweekly self-test, that switched over to 
battery, was supposed to run the battery down to 25% or similar, and
then return to mains power/charging.

Except once what the UPS considered 25% charge seemingly was not, and
everything shut down instantly.

> I rather spend the money on new batteries (EUR 40 last time after 5
> years) every couple years rather than spending thousands on replacing
> the hardware when a power surge damages it which could have been
> prevented by the UPS, and it's better to have the machines shut down
> properly rather taking risks with potential data loss, regardless of
> file systems and RAID setups in use.

I think having hardware for "thousands" and having a UPS with that
cheap batteries is not that common. In above company we certainly had 
hardware for thousands, but changing batteries cost hundreds of Euros,
even with off-brand aftermarket parts. It also was complicated to order
the right parts etc.

> RAID isn't as complicated as you think.  Hardware RAID is most
> simple,
> followed by btrfs, followed by mdadm.

I have to disagree with that too. Some hardware RAIDs might be simple,
but others are not. Tracking down the rebrandings of Adaptec,
aquisitions and mergers, is a science by itself. As is finding and
installing their Firmware and utilites. Are they still calles Avago, 
or something new again?

Or all that BBU stuff: Tracking the state of battery backup units
on the controller, and ordering and replacing the correct battery
is also not really easy. Clearly enterprise IT type of stuff, keeping
even knowledgeable people busy for hours, if you don't do it at scale 
and regularily.

Also often Linux support is problematic. Yes, it will work, but
sometimes certain utilities are not available or work as good as
with Windows.

On the other hand mdadm software RAID is well documented and painless.

> 
> With hardware RAID I can instruct someone who has no idea what
> they're
> doing to replace a failed disk remotely.  Same goes for btrfs and
> mdadm, though it better be someone who isn't entirely clueless

In fact this was my job for some time: Administering hardware RAID 
equipped servers, and instructing "remote hands" or customers to 
swap harddisks. It was not always easy, not always were the correct
disks pulled, even though it was correctly labelled. Sometimes 
clueless people tried swapping by themselves, mixing stuff up. We
also had one server with wrong labelling, for whatever reason. That 
was no fun ;) 

Now I won't dispute that RAID has its place in data centers and many
other applications. I just doubt that it is the correct choice for many
home users.

> More importantly, the hassle involved in trying to recover from a
> failed disk is ridiculously enormous without RAID and can get
> expensive when hours of work were lost.  With RAID, you don't even
> notice unless you keep an eye on it, and when a disk has failed, you
> simply order a replacement and plug it in.

Yes, that can happen. But more often than not the scenario is like it 
is with most notebooks today. You send your notebook in for repair, and
have to reinstall anyway. Happened to me. I backed up my Debian system,
sent the device in for hardware repair, got it back with Windows 10 ;)
And no, it was not the disk that was broken, but the touchpad.

> 
> It's not like you could go to a hardware store around the corner and
> get a new disk same or next day.  Even if you have a store around,
> they will need to order the disk, and that can, these days, take
> weeks
> or months or longer if it's a small store. 

For consumer hard disks? I just go to my favourite shop if I need
a replacement, and they've got maybe 20 or 30 types of hard disk 
in stock, to be bought right away. Even more with SSDs. And I am
in a smallish city, pop. 250.000.


> That is simply wrong.  RAID doesn't protect you from malware, and
> nothing protects you from user error.  If you have data losses from
> malware and/or user error more often than from failed disks, you're
> doing something majorly wrong.

In my experience user error is the main source of data loss. By far.

> This shows that you have no experience with RAID and is not an
> argument.

I've got years of experience with RAID, both in my personal use and
with employers doing stuff on RAID for customers and internal services.
In my experience RAID is a nice solution for data center type setups.
RAID often is problematic for home users or even small offices.

> Making backups is way more complicated than RAID.  You can way more
> easily overwrite the wrong backup or misinterpret error messages of
> your backup solution than you can pull the wrong disk from a RAID or
> misinterpret error messages from your RAID.

Yes. Making backups is hard. But my main point is: You need good
backups anyway. So RAID does not help you here. If you neglect backups
because you've got RAID, you are living dangerously. And some people 
actually do this. I think it is wrong and asking for problems.

> How exactly would you pull the wrong disk from a RAID and thus cause
> data loss?  Before you pull one, you make a backup. 

Often people with RAID setups do not do this (making a full backup
before a disk swap) for various reasons. 

> When the disk has
> been pulled, its contents remain unchanged and when you put it back
> in, your data is still there --- plus you have the backup.  Sure it
> can sometimes be difficult to tell which disk you need to replace,
> and it's not an issue because you can always figure out which one you
> need to replace.  You can always tell with a good hardware RAID
> because it will indicate on the trays which disk has failed and the
> controller tells you.

I've seen (well not seen, they were on the other end of the phone)
people delay disk swaps in RAID until not one but two (of 5 or 6) 
physical drives were broken. Then they were instructed to pull e.g.
drive 2 and 3, pulled the wrong ones (e.g. 1 and 2), they tried to
correct their error by pulling another one, and boom. Yes, in theory 
data probably were still there and could have been restored with
forensic techniques. But by that time the array was offline,
and it was major repair/data rescue/restore backup time.

> No, I generally don't have spares, and I don't leave my backup server
> running all the time to make backups every few hours or every day
> because electricity is way too expensive, plus it's somewhat loud and
> gives off quite a bit of heat.

This is the personal compromise that I make.
 
> How often do you verify that you can actually restore everything from
> your backups, and how do you do that?

Well I don't regularily try to restore "everything", but as some stuff
breaks regularily, most often my laptop, I get to try out replacing
systems once in a while. The last disk I had fail on me was an SSD in
my Proxmox server at home, I did a clean install of Proxmox, restored
guest backups, worked like a charm. Finding a replacement disk was the
hardest part.

Also when migrating to new hardware, it is a good time to test your
backups. When I moved my Home Assistant from one hardware to a new
type of setup on a different hardware, I implicitly tested the backups.
Worked flawlessly.

/ralph


Reply to: