Fwd: Failure to load amd64 overcome, though mem problems
Posted again from the e-mail address I am registered to
---------- Forwarded message ----------
From: Francesco Pietra <firstname.lastname@example.org>
Date: Fri, Jan 2, 2009 at 8:26 PM
Subject: Failure to load amd64 overcome, though mem problems
To: amd64 Debian <email@example.com>, debian-users
Near the end of last year, in a period of vacation, I posted to amd64
about failure to start amd64 lenny with a Supermicro H8QC8
motherboard. This board has chipset nVidia CK804, which is also memory
controller, and AMD 8132. It bears 4 dual opteron 875 CPUs, two WD
Raptor under RAID as well as 8 KVR400D4R3A/2G and 8 KVR400D4R3A/1G.
Lenny is set not to load the X system. The computer is powered through
an APC 1500 and Enermax EGX1000EWL. Cooling is extremely efficient.
The system was shut down correctly when top indicated 24GB total RAM.
After a few days untouched, the OS did not load, the screen showing a
series of lines starting with RDX RBP R10 R13 FS CS CR2 DR0 DR3,
After that such lines alternate, and the whole <Call Trace> started
several times anew, everything disappeared from the screen and could not be
recovered with the keyboard.
Knoppix 5.3.1 loaded correctly, detected all 8 logical CPUs, the raid1
partitions (mdadm) were OK, however it detected 20GB total mem,
instead of the 24GB expected.
memtest86+-2.11 detected 17GB total mem and was let to run for the
whole 8 cycles (which took seven hours), reporting no mem errors. DMI
mem device info showed:
DIMM 0 to DIMM 7: size 64; speed 400; type DDR
DIMM 8 to DIMM 10: size empty; speed 200; type DDR
DIMM 11: size 2048; speed 200; type DDR
DIMM 12 to DIMM 15: size 64; speed 200; type DDR.
On rebooting, lenny started correctly. Top showed 18079572k total,
also when running a parallelized application that engaged all 8 CPUs.
lshw agreed with memtest as to the DIMMs, except for the one marked of
size = 2048, which lshw marked of size=64.
I was surprised that half of the slots were indicated by both memtest
and lshw at speed=200; I tentatively assume this is a feature of the
mainboard not of the mem slots.
The actual mem size is insufficient for my computations and the empty
DIMMs need attention I believe. There is no system maintainer here and
I have to try to restore the system alone, also because I assembled
the computer. My question is from where to start at this point. The
mem slots seem to be plugged in as before but I did not try to remove
The four blocks on the mainboard were filled as follows:
This mail started originally under the hypothesis that the problem was
some degradation of lenny. I understand now that this mail is largely
out of topic both on amd64 and users. Hope only that experienced users
may suggest from their experience.
Thanks and happy 2009!