[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: computer hangs, how do I go about diagnosing?



on Thu, Dec 13, 2001 at 02:17:18PM -0500, Paul Reavis (preavis@partnersoft.com) wrote:
> My computer is experiencing intermittent hangs. Generally I'm doing
> something in X, but I really have no idea what the real pattern is. I
> left my machine on last night and it hung overnight.
> 
> I suspected hardware, but ran cpuburn (`burnK7`) for almost two hours
> with no trouble, and then ran a full memtest86 in real mode with no
> errors. Also I've built numerous kernels with no trouble. 

For a bit of additional detail, see my kernel-bug-report script:

    http://kmself.home.netcom.com/Download/kernel-bug-report

It's based on the REPORTING-BUGS file in /usr/src/linux, and does a good
job of compiling system information.

From your /var/log/messages output, it looks as if your problems aren't
specific to any one program:

    XFree86 (pid: 513, stackpage=cc7c9000)
    XFree86 (pid: 513, stackpage=cc7c9000)
    Eterm (pid: 2879, stackpage=d6759000)
    mixer_applet (pid: 1525, stackpage=d80e1000)
    gnome-name-serv (pid: 1493, stackpage=c4fcd000)
    mutt (pid: 3245, stackpage=d1883000)
    Eterm (pid: 1545, stackpage=dcdeb000)
    gpilotd (pid: 1497, stackpage=de019000)
    synaesthesia (pid: 1603, stackpage=d1231000)
    java (pid: 8542, stackpage=cdb05000)
    gdm (pid: 9212, stackpage=cce3f000)
    gdm (pid: 512, stackpage=cd94b000)
    gdm (pid: 503, stackpage=cc92d000)


My own suspicions would be kernel modules and hardware, in about that
order.

It's not clear when the problems started.  You've changed a few things
on your system.  Did the problems exist prior to the sid upgrade or when
running other kernels?  Do they occur when the system's running console
rather than X?  How "frozen" is the system -- just locked in X, or
nonresponsive to network or serial port access?

The bug reporting script will list your loaded modules.  Might also help
to see what your kernel options are.

You've run some CPU-intensive processes, and memtest isn't failing over.
You might try:

  - Check component and cable seating.  Poorly-seated cards and cables
    may result in strange errors, and possibly severe damage.

  - Running the system with a reduced memory allocation.  Add a
    "mem=XXX" to your LILO boot prompt, where XXX is some value less
    than your installed amount.  Cutting the allocation to 1/2 or 1/4
    will do a pretty good job of seeing whether or not the problem is
    memory.

  - Unload or disable loaded modules.  Particularly networking or device
    support.  I had similar errors at one point through Samba, with a
    flakey Samba server at the other end, 2.2.14 (note:  2._2_, not
    2._4_).

  - Boot an alternative kernel.  Is this a kernel you built yourself, or
    is it a stock kernel?  Either way, try something else.

  - Run the system in single-user mode with most partitions mounted r/o,
    while otherwise performing processing similar to that which causes
    the errors. 

> It is a sid install, on newish hardware (athlon 1.4G) with a 2.4.14 kernel. I
> updated to latest sid yesterday.

...problems existed before this upgrade?

> Follows is some errors from /var/log/messages.
> 
> Could someone clue me in on this, or at least point me towards other
> troubleshooting/diagnostic tools I could try? Thanks.
> 
> Dec 13 01:23:19 slackguy kernel: invalid operand: 0000
> Dec 13 01:23:19 slackguy kernel: CPU:    0
> Dec 13 01:23:19 slackguy kernel: EIP:    0010:[rmqueue+89/448]    Tainted: P 

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What part of "Gestalt" don't you understand?              Home of the brave
  http://gestalt-system.sourceforge.net/                    Land of the free
We freed Dmitry! Boycott Adobe! Repeal the DMCA! http://www.freesklyarov.org
Geek for Hire                      http://kmself.home.netcom.com/resume.html

Attachment: pgph3xWaVVBYR.pgp
Description: PGP signature


Reply to: