[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Asus K8N, Cool'n'Quiet, and sensors.conf



On Thu, Oct 06, 2005 at 06:06:39PM -0300, Peter Cordes wrote:
>  I'd been ignoring my Debian inbox for a long time until today...
> 
>  I have an Asus K8V (basic) with an Athlon64 3200+ (newcastle core) 1.5GB of
> RAM, two IDE disks, two SATA disks, and an ATI AIW Radeon 7200 (but I don't
> use the TV in/out features).  I run x86 2.6.12.6.  I'll eventually switch to
> AMD64 software when I know my hardware is stable with x86, so I can usefully
> make bug reports on crashing software...
> 
>  BTW, the K8V is a nice piece of hardware; the AD1980 sound hardware
> supports mixing PCM streams in hardware (or at least the driver does?), so I
> can have xmms, xine, and whatever other program all not interfering with
> each other.  (except when something is doing 4 channel output).  I couldn't
> decide between an Abit (I think) with a K8T800Pro chipset and my Asus with
> just K8T800, but I eventually chose the Asus because it had Analog Devices
> sound instead of Realtek.  I was pleasantly surprised that the sound really
> was good on it, esp. with the multiple opens of the sound dev :)
> 
>  I've found that my machine is a lot less stable when running at lower than
> max speed.  Not just stuff crashing, but memtest (from sysutils, or
> memtester; just mlock()s some memory to test, not like memtest86+).  memtest
> finds errors when the CPU is slowed down.  There might be other correlated
> factors, like disk access.  To change speed, I've just used cpufreq-set -u
> 2000MHz (or 1800MHz, or 1000MHz).  Max speed is 2200MHz.  (Newcastle core:
> from dmidecode:  ID: C0 0F 00 00 FF FB 8B 07
>                  Signature: Extended Family 0, Model C, Stepping 0
> )
> 
>  Unfortunately, the machine isn't perfectly stable even at max speed.  It
> never crashed before I upgraded the BIOS from 1.04 or something to 1.07,
> which was needed for cpufreq to work.  Even when running at 2.2GHz (full
> speed) with only one stick of RAM (1024MB OCZ), it sometimes shows a cluster
> of memory errors in memtest.  It doesn't seem significantly different from
> with both sticks of RAM, the other being a 512MB Infineon, IIRC.  All DDR400.
> 
> Run  129 completed in 357 seconds (0 tests showed errors).
> Run  130:
>   Test  1:         Stuck Address:  Testing...Passed.
>   Test  2:          Random value:  Setting...Testing...
> FAILURE: 0x7ffeebc8 != 0x7efeebc8 at offset 0x01ca67f0.
> Skipping to next test...
>   Test  3:        XOR comparison:  Setting...Testing...
> FAILURE: 0x42ff4c7a != 0x43ff4c7a at offset 0x01ca67f0.
> Skipping to next test...
>   Test  4:        SUB comparison:  Setting...Testing...
> FAILURE: 0x2707802e != 0x2807802e at offset 0x01ca67f0.
> Skipping to next test...
>   Test  5:        MUL comparison:  Setting...Testing...
> FAILURE: 0x73c9b7ae != 0xb4c9b7ae at offset 0x01ca67f0.
> Skipping to next test...
>   Test  6:        DIV comparison:  Setting...Testing...
> FAILURE: 0x00000000 != 0x00000001 at offset 0x01ca67f0.
> Skipping to next test...
>   Test  7:         OR comparison:  Setting...Testing...
> FAILURE: 0xb2dd75dc != 0xb2dd75dd at offset 0x01ca67f0.
> Skipping to next test...
>   Test  8:        AND comparison:  Setting...Testing...Passed.
>   Test  9:  Sequential Increment:  Setting...Testing...Passed.
>   Test 10:            Solid Bits:  Testing...Passed.


Well seeing the same address fail each test is a bad sign.  Maybe one
stick of memory has some flacky bits.  That would make things unstable
since it would work sometimes, but not all the time.

> BIOS on all auto settings.
> no other runs showed errors (140 runs)
> 
>  (Running at lower CPU speeds, errors were much more frequent).

Not sure why it would, although if the memory is flacky, who knows.

>  Interesting that all the errors are clustered in time and space at one
> memory location...  As I said, the software running is Debian i386 sid with
> Linux 2.6.12.6, compiled with gcc 4.0.2 20050816 (from sid).

If there is a defect in a memory chip, it is quite likely to be
localized to one part of the die in the memory chip.

>  Does anyone have any ideas?  I hate hardware I can't trust!  What's the
> point of digital logic if it makes mistakes!
> 
>  So does anyone have any experience or advice?  

Try one stick of ram at a time.  Most likely it is just one of the
sticks that has errors.  If you get errors in memtest with both, then
the cpu may have a defective memory controller.

Another posibility is that your power supply is crap and isn't providing
a steady enough power supply for the system.  athlon 64s demand very
reliable power.  A cheap 500W often provides less power than a good 300W
due to having unstable voltage levels under load.

Len Sorensen



Reply to: