[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Asus K8V Cool'n'Quiet, problems solved



On Wed, Oct 12, 2005 at 08:54:49AM -0400, Lennart Sorensen wrote:

 I've once again been studiously ignoring my Debian mailbox for a long time...
 I thought I should reply to this for the benefit of the list archive, anyway.

 I eventually found a pdf on AMD's web site (the BIOS and Kernel guide, doc
#26094)
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7203,00.html
that suggested that the Athlon64 (Socket 754 version) memory controller can
only run at DDR333 with two double-sided DIMMs, which is what I have.  No
wonder I was having problems! Setting DDR333 in the BIOS has made my machine
rock-solid stable, even when running for months at 1000MHz at 1.1V.

 My theory on why I only had problems at lower CPU speeds is that the lower
CPU voltage that comes with that either lowered the drive strength of the
memory controller, or maybe lengthened the rise and fall times of the
signals enough that it wasn't quite crisp enough for the RAM I have.

 With only one or the other DIMM, I could run at DDR400 at any CPU speed.

 So, problem solved.  At DDR333, the memory timings are a little faster, so
it's not too bad at all.  I haven't really benchmarked, though.

 Thanks for all the suggestions.  I read them at the time, and they were
useful, but I didn't get around to replying until I had the system stable.
BTW, I have an Enermax 465W power supply, so it's _solid_.

> On Thu, Oct 06, 2005 at 06:06:39PM -0300, Peter Cordes wrote:
> >  I'd been ignoring my Debian inbox for a long time until today...
> > 
> >  I have an Asus K8V (basic) with an Athlon64 3200+ (newcastle core) 1.5GB of
> > RAM, two IDE disks, two SATA disks, and an ATI AIW Radeon 7200 (but I don't
> > use the TV in/out features).  I run x86 2.6.12.6.  I'll eventually switch to
> > AMD64 software when I know my hardware is stable with x86, so I can usefully
> > make bug reports on crashing software...
> > 
> >  BTW, the K8V is a nice piece of hardware; the AD1980 sound hardware
> > supports mixing PCM streams in hardware (or at least the driver does?), so I
> > can have xmms, xine, and whatever other program all not interfering with
> > each other.  (except when something is doing 4 channel output).  I couldn't
> > decide between an Abit (I think) with a K8T800Pro chipset and my Asus with
> > just K8T800, but I eventually chose the Asus because it had Analog Devices
> > sound instead of Realtek.  I was pleasantly surprised that the sound really
> > was good on it, esp. with the multiple opens of the sound dev :)
> > 
> >  I've found that my machine is a lot less stable when running at lower than
> > max speed.  Not just stuff crashing, but memtest (from sysutils, or
> > memtester; just mlock()s some memory to test, not like memtest86+).  memtest
> > finds errors when the CPU is slowed down.  There might be other correlated
> > factors, like disk access.  To change speed, I've just used cpufreq-set -u
> > 2000MHz (or 1800MHz, or 1000MHz).  Max speed is 2200MHz.  (Newcastle core:
> > from dmidecode:  ID: C0 0F 00 00 FF FB 8B 07
> >                  Signature: Extended Family 0, Model C, Stepping 0
> > )
> > 
> >  Unfortunately, the machine isn't perfectly stable even at max speed.  It
> > never crashed before I upgraded the BIOS from 1.04 or something to 1.07,
> > which was needed for cpufreq to work.  Even when running at 2.2GHz (full
> > speed) with only one stick of RAM (1024MB OCZ), it sometimes shows a cluster
> > of memory errors in memtest.  It doesn't seem significantly different from
> > with both sticks of RAM, the other being a 512MB Infineon, IIRC.  All DDR400.
> > 
> > Run  129 completed in 357 seconds (0 tests showed errors).
> > Run  130:
> >   Test  1:         Stuck Address:  Testing...Passed.
> >   Test  2:          Random value:  Setting...Testing...
> > FAILURE: 0x7ffeebc8 != 0x7efeebc8 at offset 0x01ca67f0.
> > Skipping to next test...
> >   Test  3:        XOR comparison:  Setting...Testing...
> > FAILURE: 0x42ff4c7a != 0x43ff4c7a at offset 0x01ca67f0.
> > Skipping to next test...
> >   Test  4:        SUB comparison:  Setting...Testing...
> > FAILURE: 0x2707802e != 0x2807802e at offset 0x01ca67f0.
> > Skipping to next test...
> >   Test  5:        MUL comparison:  Setting...Testing...
> > FAILURE: 0x73c9b7ae != 0xb4c9b7ae at offset 0x01ca67f0.
> > Skipping to next test...
> >   Test  6:        DIV comparison:  Setting...Testing...
> > FAILURE: 0x00000000 != 0x00000001 at offset 0x01ca67f0.
> > Skipping to next test...
> >   Test  7:         OR comparison:  Setting...Testing...
> > FAILURE: 0xb2dd75dc != 0xb2dd75dd at offset 0x01ca67f0.
> > Skipping to next test...
> >   Test  8:        AND comparison:  Setting...Testing...Passed.
> >   Test  9:  Sequential Increment:  Setting...Testing...Passed.
> >   Test 10:            Solid Bits:  Testing...Passed.
> 
> 
> Well seeing the same address fail each test is a bad sign.  Maybe one
> stick of memory has some flacky bits.  That would make things unstable
> since it would work sometimes, but not all the time.
> 
> > BIOS on all auto settings.
> > no other runs showed errors (140 runs)
> > 
> >  (Running at lower CPU speeds, errors were much more frequent).
> 
> Not sure why it would, although if the memory is flacky, who knows.
> 
> >  Interesting that all the errors are clustered in time and space at one
> > memory location...  As I said, the software running is Debian i386 sid with
> > Linux 2.6.12.6, compiled with gcc 4.0.2 20050816 (from sid).
> 
> If there is a defect in a memory chip, it is quite likely to be
> localized to one part of the die in the memory chip.
> 
> >  Does anyone have any ideas?  I hate hardware I can't trust!  What's the
> > point of digital logic if it makes mistakes!
> > 
> >  So does anyone have any experience or advice?  
> 
> Try one stick of ram at a time.  Most likely it is just one of the
> sticks that has errors.  If you get errors in memtest with both, then
> the cpu may have a defective memory controller.
> 
> Another posibility is that your power supply is crap and isn't providing
> a steady enough power supply for the system.  athlon 64s demand very
> reliable power.  A cheap 500W often provides less power than a good 300W
> due to having unstable voltage levels under load.
> 
> Len Sorensen
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC



Reply to: