[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ECC RAM failure data - jre



>
>
>
>---- Original Message ----
>From: dtutty@vianet.ca
>To: debian-user@lists.debian.org
>Subject: Re: ECC RAM failure data   - jre
>Date: Thu, 26 Feb 2009 14:28:43 -0500
>
>>On Thu, Feb 26, 2009 at 03:19:56AM -0800, john_re wrote:
>>> Do you use ECC RAM? Do you have any data about failure rates?
>>> 
>>> I'm evaluating this for a system with 8GB DRAM, &
>>>
>http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_
>error_correction
>>> says
>>> "Tests[ecc]give widely varying error rates, but about
>10-12upset/bit-hr
>>> is typical, roughly one bit error, per month, per gigabyte of
>memory.
>>> 
>>> In most computers used for serious scientific or financial
>computing and
>>> as servers, ECC is the rule rather than the exception, as can be
>seen by
>>> examining manufacturers' specifications."
>>> 
>>> 
>>> So, for that data 8GB DRAM is about 8 errors per month, ie about
>>> one per 3-4 days.
>>> 
>>> What rates do you have?
>>
>>I don't know.  Does a box with ECC tell you?
>>
>>A non-ECC box that has an error may just show up as a random
>>non-reproducable error of a range of severity.  A piece of software
>may
>>crash, a comma turn into a period in a letter you're writing, who
>knows.
>>I think its the "who knows" factor that makes ECC worth it in some
>>applications.  
>>
>>Doug.
>>
Most of the errors ECC is designed to correct are single bit errors
that, upon refresh, are no longer there ("soft" errors).  The usual
culprit is Alpha particles that hit RAM bits all the time, some
strong enough to change the data on a bit.  As a result most ECC is
single-bit error correction.
Larry
>>
>>-- 
>>To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
>>with a subject of "unsubscribe". Trouble? Contact listmaster@lists.d
>ebian.org
>>
>>
>>




Reply to: