Re: Disk for a small server

To: debian-user@lists.debian.org
Subject: Re: Disk for a small server
From: Celejar <celejar@gmail.com>
Date: Wed, 11 Aug 2021 10:00:59 -0400
Message-id: <[🔎] 20210811100059.af1a8f6780aa5530bb5bdd52@gmail.com>
In-reply-to: <[🔎] b0e4f795-0426-363e-abc2-e95b19d7c5da@holgerdanske.com>
References: <[🔎] YRKVj1HoyJdqgIEp@brina.cybervalley.org> <[🔎] 68d64e4d-a517-b91a-8d43-58d514ba9ce3@holgerdanske.com> <[🔎] 20210810195619.ths5gqquyul3ab74@randomstring.org> <[🔎] 9988d3ee-5abd-9ef0-3e4a-d70737418815@holgerdanske.com> <[🔎] 20210810225103.c67fd6b21fa6b1087a462f9a@gmail.com> <[🔎] b0e4f795-0426-363e-abc2-e95b19d7c5da@holgerdanske.com>

On Wed, 11 Aug 2021 02:53:13 -0700
David Christensen <dpchrist@holgerdanske.com> wrote:

> On 8/10/21 7:51 PM, Celejar wrote:
> > On Tue, 10 Aug 2021 17:35:32 -0700
> > David Christensen <dpchrist@holgerdanske.com> wrote:
> > 
> >> On 8/10/21 12:56 PM, Dan Ritter wrote:
> >>> David Christensen wrote:
> >>>> On 8/10/21 8:04 AM, Leandro Noferini wrote:
> >>>>
> >>>> https://wiki.debian.org/ZFS
> > 
> > ...
> > 
> >>>> - ECC memory is safer than non-ECC memory.
> >>>
> >>> This is true, but there is nothing that makes ZFS more dangerous
> >>> than another filesystem using non-ECC memory.
> >>
> >>
> >> I think the amount of danger depends upon how you do your risk
> >> assessment math.  I find used entry-level server hardware with ECC
> >> memory to be desirable for additional reasons.
> > 
> > Dan's point is that while ECC memory is indeed safer than non-ECC
> > memory, this is true whether one is using ZFS or some other filesystem;
> > furthermore, with or without ECC memory, there's no reason to believe
> > that ZFS is less safe than the alternative.
> > 
> > See:
> > 
> > https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/?comments=1&post=38877683
> > https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/
> > 
> > So while ECC memory is always good, it's not a consideration when
> > trying to choose between ZFS and other filesystems.
> 
> 
> I see two sets of choices:
> 
> 1.  Memory integrity:
> 
>      a.  No error checking or correcting -- non-ECC.
> 
>      b.  Error checking and correcting -- ECC.
> 
> 2.  Operating system storage stack data integrity:
> 
>      a.  No data integrity -- md, LVM, ext*, FAT, NTFS.
> 
>      b.  Data integrity -- dm-integrity, btrfs, ZFS.
> 
> 
> There are four combinations of the above.  I order them from highest 
> risk (A) to lowest risk (D) as follows:
> 
> A.  Non-ECC memory (1a) and data integrity (2b)
> 
> B.  Non-ECC memory (1a) and no data integrity (2a)
> 
> C.  ECC memory (1b) and no data integrity (2a)
> 
> D.  ECC memory (1b) and data integrity (2b)
> 
> 
> I have seen a few computers with failing non-ECC memory and no OS 
> storage stack data integrity (case B).  It might take weeks or months to 
> identify the problem.  If those computers had had OS storage stack data 
> integrity with automatic correction (case A), the "scrub of death" is 
> the logical outcome (failure modes and effects analysis); it's just a 
> question of time.  Given the eventual catastrophic outcome (fault hazard 
> analysis), I see a significant difference in risk between A and B.

I myself have no personal experience or deep understanding of the
issues, but the experts do not accept your position that A is higher
risk than B due to the possibility of the "scrub of death." Here's Jim
Salter (from the second link I gave above):

> Is ZFS and non-ECC worse than not-ZFS and non-ECC? What about the Scrub
> of Death?
> 
> OK, it’s pretty easy to demonstrate that a flipped bit in RAM means
> data corruption: if you write that flipped bit back out to disk,
> congrats, you just wrote bad data. There’s no arguing that. The real
> issue here isn’t whether ECC is good to have, it’s whether non-ECC is
> particularly problematic with ZFS. The scenario usually thrown out is
> the the much-dreaded Scrub Of Death.
> 
> TL;DR version of the scenario: ZFS is on a system with non-ECC RAM that
> has a stuck bit, its user initiates a scrub, and as a result of
> in-memory corruption good blocks fail checksum tests and are
> overwritten with corrupt data, thus instantly murdering an entire pool.
> As far as I can tell, this idea originates with a very prolific user on
> the FreeNAS forums named Cyberjock, and he lays it out in this thread
> here. It’s a scary idea – what if the very thing that’s supposed to
> keep your system safe kills it? A scrub gone mad! Nooooooo!
> 
> The problem is, the scenario as written doesn’t actually make sense.
> For one thing, even if you have a particular address in RAM with a
> stuck bit, you aren’t going to have your entire filesystem run through
> that address. That’s not how memory management works, and if it were
> how memory management works, you wouldn’t even have managed to boot the
> system: it would have crashed and burned horribly when it failed to
> load the operating system in the first place. So no, you might corrupt
> a block here and there, but you’re not going to wring the entire
> filesystem through a shredder block by precious block.
> 
> But we’re being cheap here. Say you only corrupt one block in 5,000
> this way. That would still be hellacious. So let’s examine the more
> reasonable idea of corrupting some data due to bad RAM during a scrub.
> And let’s assume that we have RAM that not only isn’t working 100%
> properly, but is actively goddamn evil and trying its naive but
> enthusiastic best to specifically kill your data during a scrub:
> 
> First, you read a block. This block is good. It is perfectly good data
> written to a perfectly good disk with a perfectly matching checksum.
> But that block is read into evil RAM, and the evil RAM flips some bits.
> Perhaps those bits are in the data itself, or perhaps those bits are in
> the checksum. Either way, your perfectly good block now does not appear
> to match its checksum, and since we’re scrubbing, ZFS will attempt to
> actually repair the “bad” block on disk. Uh-oh! What now?
> 
> Next, you read a copy of the same block – this copy might be a
> redundant copy, or it might be reconstructed from parity, depending on
> your topology. The redundant copy is easy to visualize – you literally
> stored another copy of the block on another disk. Now, if your evil RAM
> leaves this block alone, ZFS will see that the second copy matches its
> checksum, and so it will overwrite the first block with the same data
> it had originally – no data was lost here, just a few wasted disk
> cycles. OK. But what if your evil RAM flips a bit in the second copy?
> Since it doesn’t match the checksum either, ZFS doesn’t overwrite
> anything. It logs an unrecoverable data error for that block, and
> leaves both copies untouched on disk. No data has been corrupted. A
> later scrub will attempt to read all copies of that block and validate
> them just as though the error had never happened, and if this time
> either copy passes, the error will be cleared and the block will be
> marked valid again (with any copies that don’t pass validation being
> overwritten from the one that did).
> 
> Well, huh. That doesn’t sound so bad. So what does your evil RAM need
> to do in order to actually overwrite your good data with corrupt data
> during a scrub? Well, first it needs to flip some bits during the
> initial read of every block that it wants to corrupt. Then, on the
> second read of a copy of the block from parity or redundancy, it needs
> to not only flip bits, it needs to flip them in such a way that you get
> a hash collision. In other words, random bit-flipping won’t do – you
> need some bit flipping in the data (with or without some more
> bit-flipping in the checksum) that adds up to the corrupt data
> correctly hashing to the value in the checksum. By default, ZFS uses
> 256-bit SHA validation hashes, which means that a single bit-flip has a
> 1 in 2^256 chance of giving you a corrupt block which now matches its
> checksum. To be fair, we’re using evil RAM here, so it’s probably going
> to do lots of experimenting, and it will try flipping bits in both the
> data and the checksum itself, and it will do so multiple times for any
> single block. However, that’s multiple 1 in 2^256 (aka roughly 1 in
> 10^77) chances, which still makes it vanishingly unlikely to actually
> happen… and if your RAM is that damn evil, it’s going to kill your data
> whether you’re using ZFS or not. ...

[snipped the rest of Jim's analysis]

> I don’t care about your logic! I wish to appeal to authority!
> 
> OK. “Authority” in this case doesn’t get much better than Matthew
> Ahrens, one of the cofounders of ZFS at Sun Microsystems and current
> ZFS developer at Delphix. In the comments to one of my filesystem
> articles on Ars Technica, Matthew said “There’s nothing special about
> ZFS that requires/encourages the use of ECC RAM more so than any other
> filesystem.”
> 
> Hope that helps. =)

Celejar

Reply to:

Follow-Ups:
- Re: Disk for a small server
  - From: David Christensen <dpchrist@holgerdanske.com>

References:
- Disk for a small server
  - From: Leandro Noferini <lnoferin@cybervalley.org>
- Re: Disk for a small server
  - From: David Christensen <dpchrist@holgerdanske.com>
- Re: Disk for a small server
  - From: Dan Ritter <dsr@randomstring.org>
- Re: Disk for a small server
  - From: David Christensen <dpchrist@holgerdanske.com>
- Re: Disk for a small server
  - From: Celejar <celejar@gmail.com>
- Re: Disk for a small server
  - From: David Christensen <dpchrist@holgerdanske.com>

Prev by Date: Re: Reply configuration (was: All-in-One printer: HP OfficeJet 8012)
Next by Date: Re: All-in-One printer: HP OfficeJet 8012
Previous by thread: Re: Memory and other hardware safety issue regarding ZFS [was Disk for a small server]
Next by thread: Re: Disk for a small server
Index(es):
- Date
- Thread