apparent crashes persist.
It's been a while now, about five months and I still haven't gor my
AMD64 working properly. Granted, I haven't been working at it *all* the
time, have taken time off to bury my father and deal with an estate and
taxes. There are always death and taxes, aren't there? It's working
better than it was originally, but it still crashes. At least, it gives
the appearance of crashing. The mouse freeses, and nothing seems to
work properly except for the reset button.
Close investigation indicates this is not quite completely dead. At least
sometimes, the keyboard is still active for things like using tab to
switch between fields in Firefox. But control-alt-F* doesn't let me
switch virtual terminals any more. There's little I can do but press
reset. Control-alt-backspace shuts my X session down, but doesn't get
me to anyplace I can recover from.
But another thing still works, too. This morning I discovered that if I
already have an ssh connexion into the machine from elsewhere, I can
continue using it. Except that response time has ballooned -- sometimes
it responds in about ten or fifteen seconds, but usually it's minutes.
Eventually, though, it does respond, so it's not dead. I'll call this
behaviour a "crash" in the rest of this message, though, as the machine
is pretty well useless until a reboot.
Evidently, something is hogging some critical resource, possibly the CPU
(the usual suspect) but it could also be a networking resource, since
that's what the mouse and ssh seem to have in common. Any other
But a lot of the system *does* run well.
When used as a file server, it works flawlessly.
When logged into remotely from another machine using XDMCP, it works
When used locally in text mode, it workd flawlessly.
When logged in locally using the X server it crashes. I suspect a
measure of software involvement in the crashes, because when I upgraded
the nvidia drivers from 1.0.8756-1 to 1.0.8762-2 and also upgraded the
kernel from 2.6.12 to 2.6.15 the crashes became less frequent. Before,
it seemed that it crashed when the mouse moved during screen updates.
Not is seems (mostly) just to crash during menu operations. Once it
even crashed in gdm when I was using the menu to select my session. It
seems to crash less with fvwm than icewm.
The motherboard is an Asus A8N-VM-UAYGZ
described on the box as
Aocket 939, nVidia
channel DDR,VGA integrated, PCI Express
X16 6-channel HD audio, 10/100 LAN,
ATA133*2+SATA II*28 USB 2.0,
Q-Fan technology, CrashFree BIOS
I have 2 G RAM, and 1G swap.
It's normally boots a 2.6.15 stock Debian kernel, runs etch, has nvidia
drivers I compiled from the Debian nvidia-kernel-source package.
It can also boot a 2.6.12 kernel, which I keep around to have an
alternative in case things go really wrong. For this reason udev is
held at 0.091-2, since the latest etch udev, 0.093-1, rejects the 2.6.12
There is no Microsoft software on this machine. Not even close. Not
text consoles, xterms. I can compile, run, edit, use emacs, ssh to
another machine to read my mail, edit quotas, run aptitude, etc. All
with no trouble.
* menus, sometimes, but enough to make me leery of opening menus.
* xjig, after a few puzzles, but rarely. ABout 5 to 10% chance of
crashing per puzzle. It used to be about 20% with the old nvidia
drivers and 2.6.12.
What crashes fast:
* Firefox, especially when scrolling on complex web pages.
Other browsers, but I forget which ones.
What should I do to try and diagnose the problem?
* If I get into the system using a text login after a crash but before a
reboot, what information should I collect? From where in the system?
* What logs should I still bother collecting *after a reboot*, in case I
can't do it before ther raboot?
* What software is available to test this hardware?
* Would it be diagnostic to do a fresh install of the 32-bit etch in new
partition? If so, which installer is actually likely to work these
days? (I've been seeing some worrisome reports on debian-user. The
installer I use *has* to be able to handle software RAID and lvm, since
/home is on a reiser LVM sotware-RAID partition. Is there any
particular problem in kaving two boot partitions and having to keep LVM
configuration files synchronised?