[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: diagnosing hard-locks [was memtest+ won't load]



Andrew Sackville-West wrote:
> On Fri, Jul 07, 2006 at 11:08:00AM +0200, Dominique Dumont wrote:
> > Andrew Sackville-West <andrew@farwestbilliards.com> writes:
> > 
> > > now on to finding the source of my hard locks. ugh./
> > 
> > Be sure to check the temperature of your CPU. (been there) 
> 
> yeah, another good idea. unfortunately, that's a no go either. I'm
> running at a pretty steady 132 F/ 56 C right now with it climbing to
> about 140F, but no more when I'm pushing it. Also, it seems more
> common when I'm NOT working. Also, doesn't crop up when I leave some
> crunching job running overnight (like a big transcode job or something).
> 
> I ran memtest all night with no problems at all. Also ran stesscpu for
> a while with no probs. I'm beginning to think, as i crawl through my
> memory, that it might be related to some acpi stuff I was fooling
> with. we'll see.
> 
> These locks are particularly frustrating as they leave no trace in the
> logs. the logs just stop until they come up from the reboot. I've seen
> it happen while I'm working maybe two or three times... just lock
> right up and be totally gone. screen looks fine, nothing responds, no
> ssh, no response to keyboard (capslock/numlock frozen) etc. Sometimes
> even the restart button on the box won't work and I'll have to force a
> shutdown with the 4-second power button press. So I have no doubts
> that its locking up tight. This is part of why I think it might be
> related to that acpi stuff as it seems to happen a lot when I'm NOT
> working at it. perhaps the bios is trying to suspend something and it
> causes a problem? 

I'm running a dual-boot; 
[hdc] Debian Sarge (3.1r2, kernel 2.6.8-3-686) 
[hda] win98 

Couple of thoughts;

* I notice the "nobody" account/group(?) start thrashing late at night, when
the monitor (only) has been sleeping for a little, which translates into me not
using the system and it's resources at that time (idle). I ran 'top' when I
heard this going on (fearing a rootkit), and found 'nobody' using 'find' IIRC
...I suspect updating the databases ('updatedb'), or 'inodes' or the ext3
journal, or defragging (the linux way) - likely being triggered as a 'cron'
job/task perhaps(?)...yet pushed off in time until there are ebough system
resources available -- I'm still not sure exactly what it's doing, but the 1st
time it happened, I 'kill -9 PID' of the processID found using 'top'. It killed
it alright, but only until the system had 'rested' and off to the races it
went, once again.

ACPI hmm.. Let's see output of 'dmesg | grep ACPI'

Do you use an 'acpi=force' kernel boot option in GRUB/Lilo ??
I do on this ~1999 PII, 350MHz 100FSB, 192MB RAM;

~$ cat /grub/boot/menu.lst
=============
[...]

title           Debian GNU/Linux, kernel 2.6.8-3-686
root            (hd1,0)
kernel          /boot/vmlinuz-2.6.8-3-686 root=/dev/hdc1 ro acpi=force
initrd          /boot/initrd.img-2.6.8-3-686
savedefault
boot

[...]
=============

[ anecdotal rantings ensue]
* My Keyboard died recently, it was possessed by evil demons, satanic minions,
and Linda Blair :-8 -- After experiencing some similar hard lockups, (yet only
after returning after having been away for awhile on this box),  I finally
_refused_ to yank the power cord -- this last time. (The default ATX 4sec.
constant depress doesn't quite work anymore :-() - and no, it's not the BIOS,
nor power management setting that's wrong.

I started POUNDING on the keyboard -- any/all combos -- full open flat hands
and all :-)
I ended up with a Keyboard that would not only NOT let me into the BIOS (F2),
it would NOT let me type ANYTHING at all !! -- Even after a hard kill and a
cold boot (actually many of them)....and many PS/2 cable yanks/plugging ins of
both the Rodent and Keyboard, with system running and sometimes not
running........
I found eventually that these "7" keys were the ONLY ones that would output
anything, and only sometimes..

These magic 7 keys are/were; (NumLock On -->) __3,4,5,6, 'Alt', ']', '[''__
(brackets)
Oh, the NumLock was fritzzing all the time, -- sometimes the CapLock would go
on by poushing the TAB key.

...and the ONLY output of all these 7 was the SAME (a BACKSLASH character!!!)
-- yep, each one of the diff 7 keys output a '\' (...or was it '/' forward
slash), I forget. But I was determind to fix it at all costs! Thinking it was
some sort of software issue, but after realizing I could NOT boot to a LiveCD
and use the keyboard, nor could I enter the BIOS, I though I munged my SuperI/O
chip's Keyboard Controller :-(

I even ReFlashed the BIOS :-p
More than once...
I also moved the CMOS jumper and "Maintenance mode" and booted...nada
I then REmoved the jumper for *Recovery Mode*, which turned out to be the ONLY
way I could Flash (using a Floppy and listening for beep codes), because I
could not hit <ENTER> once booted into the Intel Flash utility <mad!>.

Long story end is  -- I got a NEW Keyboard (actually used), fired her up, and
All is Well !!
[ /anecdotal rantings end ]

I'm not sure that (above tale) was meaningful at all to you - but wanted to
share my exp., perhaps it'll spark/ignite some ideas.

Regards

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



Reply to: