[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

apparent crashes persist.



It's been a while now, about five months and I still haven't gor my 
AMD64 working properly.  Granted, I haven't been working at it *all* the 
time, have taken time off to bury my father and deal with an estate and 
taxes.  There are always death and taxes, aren't there?  It's working 
better than it was originally, but it still crashes.  At least, it gives 
the appearance of crashing.  The mouse freeses, and nothing seems to 
work properly except for the reset button.

Close investigation indicates this is not quite completely dead.  At least 
sometimes, the keyboard is still active for things like using tab to 
switch between fields in Firefox.  But control-alt-F* doesn't let me 
switch virtual terminals any more.  There's little I can do but press 
reset.  Control-alt-backspace shuts my X session down, but doesn't get 
me to anyplace I can recover from.

But another thing still works, too.  This morning I discovered that if I 
already have an ssh connexion into the machine from elsewhere, I can 
continue using it.  Except that response time has ballooned -- sometimes 
it responds in about ten or fifteen seconds, but usually it's minutes.  
Eventually, though, it does respond, so it's not dead.  I'll call this 
behaviour a "crash" in the rest of this message, though, as the machine 
is pretty well useless until a reboot.

Evidently, something is hogging some critical resource, possibly the CPU 
(the usual suspect) but it could also be a networking resource, since 
that's what the mouse and ssh seem to have in common.  Any other 
candidates?

---

But a lot of the system *does* run well.

When used as a file server, it works flawlessly.

When logged into remotely from another machine using XDMCP, it works 
flawlessly.

When used locally in text mode, it workd flawlessly.

When logged in locally using the X server it crashes.  I suspect a 
measure of software involvement in the crashes, because when I upgraded 
the nvidia drivers from 1.0.8756-1 to 1.0.8762-2 and also upgraded the 
kernel from 2.6.12 to 2.6.15  the crashes became less frequent.  Before, 
it seemed that it crashed when the mouse moved during screen updates.  
Not is seems (mostly) just to crash during menu operations.  Once it 
even crashed in gdm when I was using the menu to select my session.  It 
seems to crash less with fvwm than icewm.

---

The motherboard is an Asus A8N-VM-UAYGZ

described on the box as 
Aocket 939, nVidia
GeForce6100+nForce410,2000MT/sDual-
channel DDR,VGA integrated, PCI Express
X16 6-channel HD audio, 10/100 LAN,
ATA133*2+SATA II*28 USB 2.0,
Q-Fan technology, CrashFree BIOS

I have 2 G RAM, and 1G swap.

It's normally boots a 2.6.15 stock Debian kernel, runs etch, has nvidia 
drivers I compiled from the Debian nvidia-kernel-source package.

It can also boot a 2.6.12 kernel, which I keep around to have an 
alternative in case things go really wrong.  For this reason udev is 
held at 0.091-2, since the latest etch udev, 0.093-1, rejects the 2.6.12 
kernel.

There is no Microsoft software on this machine.  Not even close.  Not 
even wine.

---

What works:

text consoles, xterms.  I can compile, run, edit, use emacs, ssh to 
another machine to read my mail, edit quotas, run aptitude, etc.  All 
with no trouble.

What crashes:

* menus, sometimes, but enough to make me leery of opening menus.
* xjig, after a few puzzles, but rarely. ABout 5 to 10% chance of 
crashing per puzzle.  It used to be about 20% with the old nvidia 
drivers and 2.6.12.

What crashes fast:

* Firefox, especially when scrolling on complex web pages.
* Pan

Other browsers, but I forget which ones.

---

What should I do to try and diagnose the problem?

* If I get into the system using a text login after a crash but before a 
reboot, what information should I collect?  From where in the system?
* What logs should I still bother collecting *after a reboot*, in case I 
can't do it before ther raboot?
* What software is available to test this hardware?

* Would it be diagnostic to do a fresh install of the 32-bit etch in new 
partition?  If so, which installer is actually likely to work these 
days?  (I've been seeing some worrisome reports on debian-user.  The 
installer I use *has* to be able to handle software RAID and lvm, since 
/home is on a reiser LVM sotware-RAID partition.  Is there any 
particular problem in kaving two boot partitions and having to keep LVM 
configuration files synchronised?

-- hendrik



Reply to: