[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Lenny, apt(titude), dependency issues, seg faults and corrupted .DEB packages



On Sat, 15 Nov 2008, Mark Allums wrote:
>>>>> Memtest86+ is a GPL'd memory testing suite that should work with
>>>>> anything in the i386-amd64 family.
>>>> Yeah, but it's not actually that good at testing memory.
>>>  Do you have a reference for that?

I do.  We never trust memtest negatives unless it has been running for over
48H, we have seen several false negatives.

And I have never seen it manage to flag an error on ECC systems.  I am not
really sure it knows how to deal with every memory controller in use on
servers out there, or that it DOES know how to talk right to, say, a i82875
like the one in my desktop.

False positives, OTOH, are unknown of.  If it says memory is bad, memory IS
bad.  Of course, it could be bad for a number of reasons, of which a bad
memory chip in the module is only the most likely.  We have seen it happen
due to problems in peripherals that were screwing up with the DC power
lines, and due to PSUs (for the same reason).

>> Nope.  I also can't find one after a bit of googling.  I seem to 
>> remember that there were two or 3 people on gentoo-user that were able 
>> to find memory errors with a perl script that memtest86 couldn't find.  
>> Perhaps those were actually CPU errors with similar symptoms.

On x86 (and AFAIK, amd64) it can easily happen due to address aliasing
across different cache configurations or due to PCI horkage, yes.  It is one
of those kernel developer' nightmares.

> MS's memory tester is pretty terrible IMO, but the memtests are a little  

It flags errors faster than memtest in the opinion of some of our lab techs.
Probably memtest is too conservative in what it does to the platform.

> better.  At least, I was able to find errors with memtest86+ that MS  
> didn't find.  (Could have been false positives of some kind, I suppose.)  

Interesting.  I will relay that to our lab people.  Probably means they need
to do both: 12H under MS, 24H or more under memtest.  Ugh.

> One must always remember to use the latest version, sometimes the newer  
> CPUs and chipsets aren't supported properly and there can be very subtle  
> bugs.

Yes, and to always use a verified-good gcc.  A miscompiled memtest due to
buggy gcc in Gentoo caused a *LOT* of problems sometime ago...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: