Re: Asus K8N, Cool'n'Quiet, and sensors.conf
On Thu, Oct 06, 2005 at 06:06:39PM -0300, Peter Cordes wrote:
> I'd been ignoring my Debian inbox for a long time until today...
>
> I have an Asus K8V (basic) with an Athlon64 3200+ (newcastle core) 1.5GB of
> RAM, two IDE disks, two SATA disks, and an ATI AIW Radeon 7200 (but I don't
> use the TV in/out features). I run x86 2.6.12.6. I'll eventually switch to
> AMD64 software when I know my hardware is stable with x86, so I can usefully
> make bug reports on crashing software...
>
> BTW, the K8V is a nice piece of hardware; the AD1980 sound hardware
> supports mixing PCM streams in hardware (or at least the driver does?), so I
> can have xmms, xine, and whatever other program all not interfering with
> each other. (except when something is doing 4 channel output). I couldn't
> decide between an Abit (I think) with a K8T800Pro chipset and my Asus with
> just K8T800, but I eventually chose the Asus because it had Analog Devices
> sound instead of Realtek. I was pleasantly surprised that the sound really
> was good on it, esp. with the multiple opens of the sound dev :)
>
> I've found that my machine is a lot less stable when running at lower than
> max speed. Not just stuff crashing, but memtest (from sysutils, or
> memtester; just mlock()s some memory to test, not like memtest86+). memtest
> finds errors when the CPU is slowed down. There might be other correlated
> factors, like disk access. To change speed, I've just used cpufreq-set -u
> 2000MHz (or 1800MHz, or 1000MHz). Max speed is 2200MHz. (Newcastle core:
> from dmidecode: ID: C0 0F 00 00 FF FB 8B 07
> Signature: Extended Family 0, Model C, Stepping 0
> )
>
> Unfortunately, the machine isn't perfectly stable even at max speed. It
> never crashed before I upgraded the BIOS from 1.04 or something to 1.07,
> which was needed for cpufreq to work. Even when running at 2.2GHz (full
> speed) with only one stick of RAM (1024MB OCZ), it sometimes shows a cluster
> of memory errors in memtest. It doesn't seem significantly different from
> with both sticks of RAM, the other being a 512MB Infineon, IIRC. All DDR400.
>
> Run 129 completed in 357 seconds (0 tests showed errors).
> Run 130:
> Test 1: Stuck Address: Testing...Passed.
> Test 2: Random value: Setting...Testing...
> FAILURE: 0x7ffeebc8 != 0x7efeebc8 at offset 0x01ca67f0.
> Skipping to next test...
> Test 3: XOR comparison: Setting...Testing...
> FAILURE: 0x42ff4c7a != 0x43ff4c7a at offset 0x01ca67f0.
> Skipping to next test...
> Test 4: SUB comparison: Setting...Testing...
> FAILURE: 0x2707802e != 0x2807802e at offset 0x01ca67f0.
> Skipping to next test...
> Test 5: MUL comparison: Setting...Testing...
> FAILURE: 0x73c9b7ae != 0xb4c9b7ae at offset 0x01ca67f0.
> Skipping to next test...
> Test 6: DIV comparison: Setting...Testing...
> FAILURE: 0x00000000 != 0x00000001 at offset 0x01ca67f0.
> Skipping to next test...
> Test 7: OR comparison: Setting...Testing...
> FAILURE: 0xb2dd75dc != 0xb2dd75dd at offset 0x01ca67f0.
> Skipping to next test...
> Test 8: AND comparison: Setting...Testing...Passed.
> Test 9: Sequential Increment: Setting...Testing...Passed.
> Test 10: Solid Bits: Testing...Passed.
Well seeing the same address fail each test is a bad sign. Maybe one
stick of memory has some flacky bits. That would make things unstable
since it would work sometimes, but not all the time.
> BIOS on all auto settings.
> no other runs showed errors (140 runs)
>
> (Running at lower CPU speeds, errors were much more frequent).
Not sure why it would, although if the memory is flacky, who knows.
> Interesting that all the errors are clustered in time and space at one
> memory location... As I said, the software running is Debian i386 sid with
> Linux 2.6.12.6, compiled with gcc 4.0.2 20050816 (from sid).
If there is a defect in a memory chip, it is quite likely to be
localized to one part of the die in the memory chip.
> Does anyone have any ideas? I hate hardware I can't trust! What's the
> point of digital logic if it makes mistakes!
>
> So does anyone have any experience or advice?
Try one stick of ram at a time. Most likely it is just one of the
sticks that has errors. If you get errors in memtest with both, then
the cpu may have a defective memory controller.
Another posibility is that your power supply is crap and isn't providing
a steady enough power supply for the system. athlon 64s demand very
reliable power. A cheap 500W often provides less power than a good 300W
due to having unstable voltage levels under load.
Len Sorensen
Reply to: