[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Memory and other hardware safety issue regarding ZFS [was Disk for a small server]



Hi,

On 2021-08-11 5:53 a.m., David Christensen wrote:
> On 8/10/21 7:51 PM, Celejar wrote:
>> On Tue, 10 Aug 2021 17:35:32 -0700
>> David Christensen <dpchrist@holgerdanske.com> wrote:
>>
>>> On 8/10/21 12:56 PM, Dan Ritter wrote:
>>>> David Christensen wrote:
>>>>> On 8/10/21 8:04 AM, Leandro Noferini wrote:
>>>>>
>>>>> https://wiki.debian.org/ZFS
>>
>> ...
>>
>>>>> - ECC memory is safer than non-ECC memory.
>>>>
>>>> This is true, but there is nothing that makes ZFS more dangerous
>>>> than another filesystem using non-ECC memory.
>>>
>>>
>>> I think the amount of danger depends upon how you do your risk
>>> assessment math.  I find used entry-level server hardware with ECC
>>> memory to be desirable for additional reasons.
>>
>> Dan's point is that while ECC memory is indeed safer than non-ECC
>> memory, this is true whether one is using ZFS or some other filesystem;
>> furthermore, with or without ECC memory, there's no reason to believe
>> that ZFS is less safe than the alternative.
>>
>> See:
>>
>> https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/?comments=1&post=38877683
>>
>> https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/
>>
>> So while ECC memory is always good, it's not a consideration when
>> trying to choose between ZFS and other filesystems.
> 
> 
> I see two sets of choices:
> 
> 1.  Memory integrity:
> 
>     a.  No error checking or correcting -- non-ECC.
> 
>     b.  Error checking and correcting -- ECC.
> 
> 2.  Operating system storage stack data integrity:
> 
>     a.  No data integrity -- md, LVM, ext*, FAT, NTFS.
> 
>     b.  Data integrity -- dm-integrity, btrfs, ZFS.
> 
> 
> There are four combinations of the above.  I order them from highest
> risk (A) to lowest risk (D) as follows:
> 
> A.  Non-ECC memory (1a) and data integrity (2b)
> 
> B.  Non-ECC memory (1a) and no data integrity (2a)
> 
> C.  ECC memory (1b) and no data integrity (2a)
> 
> D.  ECC memory (1b) and data integrity (2b)
> 
> 
> I have seen a few computers with failing non-ECC memory and no OS
> storage stack data integrity (case B).  It might take weeks or months to
> identify the problem.  If those computers had had OS storage stack data
> integrity with automatic correction (case A), the "scrub of death" is
> the logical outcome (failure modes and effects analysis); it's just a
> question of time.  Given the eventual catastrophic outcome (fault hazard
> analysis), I see a significant difference in risk between A and B.
> 
> 
> I started buying ECC machines specifically for ZFS a few years ago (case
> D), and suffered through a rash of drive, rack, cable, and/or HBA
> failures.  Given RAID, ZFS snapshots, backups, etc.,, I replaced bad
> drives, fixed connections, resilvered, restored, verified, etc., with
> minimal loss.  If I had chosen md, LVM, and ext4 instead (case C), there
> would still be hardware checksums inside the drives, hardware checksums
> on the connections,  and memory checksums.  So, the risk difference C-D
> is less pronounced than A-B.
> 
> 
> Holding the data integrity choice constant and comparing memory choices
> (cases A-D and cases B-C), I see more risk with non-ECC memory and less
> risk with ECC memory for both data integrity choices.
> 
> 
> So, I do consider memory when choosing the storage stack.  Furthermore,
> my OS storage stack data integrity choice with non-ECC memory is the
> opposite of my choice with ECC memory.  My desktops and laptops have
> non-ECC and ext4 (case B).  My servers have ECC and ZFS (case D).
> 
> 
> Therefore, my suggestion of ZFS on RPi contradicts my own practice.  :-/
> 
> 
> David
> 

-- 
Polyna-Maude R.-Summerside
-Be smart, Be wise, Support opensource development

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: