Re: Server hardware advice.

To: debian-user@lists.debian.org
Subject: Re: Server hardware advice.
From: David Christensen <dpchrist@holgerdanske.com>
Date: Thu, 8 Aug 2019 15:08:04 -0700
Message-id: <[🔎] d1b2fdb1-c85b-427a-82dd-9373a08a9914@holgerdanske.com>
In-reply-to: <[🔎] 20190808142207.zpmnd42ycatsm6vl@randomstring.org>
References: <[🔎] 7751459.HISf6eGanq@dreki-pc> <[🔎] 8702fc7c-0876-a90e-70bb-352247cf8d68@holgerdanske.com> <[🔎] 20190808094852.9454349598cc884c47d201fd@gmail.com> <[🔎] 20190808142207.zpmnd42ycatsm6vl@randomstring.org>

On 8/8/19 7:22 AM, Dan Ritter wrote:

To summarize: if you're running ZFS, it can protect you from
lots of sources of data corruption. It can't protect you from
RAM errors without ECC, so you should opt for ECC if integrity
is your goal.

None of the other filesystems protect you against RAM errors
either, so this is not a special requirement of ZFS.

+1

The same goes for anything that uses main memory, which is pretty mucheverything I use computers for.

Bad data in memory is bad enough, but bad data written to disk is thegift that keeps on giving -- replication overwriting good data, snapshotand backup rotation overwriting good data, archive destructiondestroying good data, etc.. The longer it takes to figure out the datais bad, the less likely you can recover.



For me, the key points in favor of ECC are:

1. Wikipedia gives DRAM bit error rates (BER) from 10E-10 to 10E-17errors per bit per hour [1]. So, 1 error per year for 114 kB to 1.14 TBof DRAM on average under some test conditions.

2. In the wild, not all chips, modules, sockets, capacitors,motherboards, etc., are healthy or compatible. Real BER's can be muchhigher.

3. The BER of DRAM tends to increase as the transistors, capacitors,lines, etc., get smaller and faster [2]. Given Moore's Law,manufacturers must be hard pressed just to maintain the BER with eachnew generation.

4. Moore's Law again: the amount of DRAM in devices has been increasingexponentially, thus adding more DRAM that can error.

So, it is just a matter of time before the error probability curvecrosses the BER specification. One article I read said desktops andlaptops already crossed it at 8 to 16 GB. COTS servers can have one ortwo orders of magnitude more memory.



David


[1] https://en.wikipedia.org/wiki/Dynamic_random-access_memory

[2] https://danluu.com/why-ecc/

Reply to:

References:
- Server hardware advice.
  - From: Steven Mainor <steve@degga.net>
- Re: Server hardware advice.
  - From: David Christensen <dpchrist@holgerdanske.com>
- Re: Server hardware advice.
  - From: Celejar <celejar@gmail.com>
- Re: Server hardware advice.
  - From: Dan Ritter <dsr@randomstring.org>

Prev by Date: Re: What drivers do I need for a Nvidia geforce GTX 1660 Ti to work?
Next by Date: Re: How free is Debian
Previous by thread: Re: Server hardware advice.
Next by thread: Re: Server hardware advice.
Index(es):
- Date
- Thread