[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Crash postmortem.



on Fri, Oct 05, 2001 at 01:18:10PM +0200, oivvio polite (ol1@v10a.com) wrote:
> 
> I've been experiencing some system instability lately.

> The system crashes like once a week or something. Ctrl-Alt-Delete has
> no effect. And I can't even ssh in from another host to do a graceful
> shutdown.  Of course I'm not asking any of you to analyze my problem
> with this little informatation. Rather I'm asking for tips an general
> methods for analyzing crashes after the fact. Or maybe some prog that
> monitor all processes and warns me when something is starting to go
> haywire?

I'm attaching my kernel bug report script.  If you're getting Oopses,
you can run this to generate some useful kernel debug info.

In general, check your system logs for any messages which might be
related to the condition.  Anything unusual or repeatedly occuring prior
to crashes is suspect.

Memory and CPU are standard culprits.  Run a memory test.  Note that if
you're using memtest86, it runs continuously, logging results (unless it
fails).  The LNX-BBC bootable business card has a boot mode where this
is *all* that runs, for maximum test effectiveness.

Try running w/ and w/o X running.  Video is a big cause of problems.

Remove kernel modules and check system performance w/ and w/o same.

Remove hardware (cards, peripherals) and check system performance w/ and
w/o same.

There are online guides to kernel bug diagnostics, Google is your
friend.

Most GNU/Linux system crashes are:

   - Bad hardware.
   - Buggy drivers.
   - Misconfigured or inappropriate drivers.

Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What part of "Gestalt" don't you understand?              Home of the brave
  http://gestalt-system.sourceforge.net/                    Land of the free
   Free Dmitry! Boycott Adobe! Repeal the DMCA!  http://www.freesklyarov.org
Geek for Hire                      http://kmself.home.netcom.com/resume.html
#!/bin/bash

# Kernel bug report generator script
# Script generated from prior bug report form by Karsten M. Self
# $Revision: 1.3 $ $Date: 2000/05/13 07:48:36 $ $Author: root $


# ------------------------------------------------------------------------
# [Some of this is taken from Frohwalt Egerer's original linux-kernel FAQ]
# 
#      What follows is a suggested procedure for reporting Linux bugs. You
# aren't obliged to use the bug reporting format, it is provided as a guide
# to the kind of information that can be useful to developers - no more.
# 
#      If the failure includes an "OOPS:" type message in your log or on
# screen please read "Documentation/oops-tracing.txt" before posting your
# bug report. This explains what you should do with the "Oops" information
# to make it useful to the recipient.
# 
#       Send the output the maintainer of the kernel area that seems to
# be involved with the problem. Don't worry too much about getting the
# wrong person. If you are unsure send it to the person responsible for the
# code relevant to what you were doing. If it occurs repeatably try and
# describe how to recreate it. That is worth even more than the oops itself.
# The list of maintainers is in the MAINTAINERS file in this directory.
# 
#       If you are totally stumped as to whom to send the report, send it to
# linux-kernel@vger.rutgers.edu. (For more information on the linux-kernel
# mailing list see http://www.tux.org/lkml/).
# 
# This is a suggested format for a bug report sent to the Linux kernel mailing 
# list. Having a standardized bug report form makes it easier  for you not to 
# overlook things, and easier for the developers to find the pieces of 
# information they're really interested in. Don't feel you have to follow it.
# 
#    First run the ver_linux script included as scripts/ver_linux or
# at <URL:ftp://ftp.sai.msu.su/pub/Linux/ver_linux> It checks out
# the version of some important subsystems.  Run it with the command
# "sh scripts/ver_linux"
# 
# Use that information to fill in all fields of the bug report form, and
# post it to the mailing list with a subject of "PROBLEM: <one line
# summary from [1.]>" for easy identification by the developers    
# ------------------------------------------------------------------------

# indent by one tabstop
function tabout () { sed -e '/^/s//	/'; }

kversion=$( uname -r )
dmesg=dmesg                     # for live use
dmesg="cat /var/log/kern.log"	# for debugging only
oops_number=$( $dmesg | grep Oops | tail -1 | sed -e '/^.*:/s///' )
oops_module=$( $dmesg | grep EIP | tail -1 | sed -e '/^.*:/s///' )

cat <<EOF

This is a script-generated kernel bug report.  

The system administrator/developer should provide additional information 
where appropriate.

kernel-bug-report: $Revision: 1.3 $ $Date: 2000/05/13 07:48:36 $ $Author: root $

[1.] One line summary of the problem:    

	PROBLEM:  $1 oops ${oops_number:-XXX-OOPS-NUMBER-XXX} in ${oops_module:-XXX-MODULE-XXX}, $kversion kernel

[2.] Full description of the problem/report:

	n/a

[3.] Keywords (i.e., modules, networking, kernel):

	linux kernel $kversion oops ${oops_number:-XXX-OOPS-NUMBER-XXX} ${oops_module:-XXX-MODULE-XXX}

[4.] Kernel version (from /proc/version):

$( cat /proc/version | tabout )

[5.] Output of Oops.. message (if applicable) with symbolic information 
     resolved (see Documentation/oops-tracing.txt)

$( $dmesg | ksymoops -k /proc/ksyms | tabout )

[6.] A small shell script or example program which triggers the
     problem (if possible)

	n/a

[7.] Environment

$( set | tabout )

[7.1.] Software (add the output of the ver_linux script here)

$( sh -f /usr/src/linux/scripts/ver_linux | tabout )

[7.2.] Processor information (from /proc/cpuinfo):

$( cat /proc/cpuinfo | tabout )

[7.3.] Module information (from /proc/modules):

$( cat /proc/modules | tabout )

[7.4.] SCSI information (from /proc/scsi/scsi)

$( cat /proc/scsi/scsi | tabout )

[7.5.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):

	System memory (at time of oops):
$( cat /proc/meminfo | tabout )

	System uptime:
$( uptime | tabout )

[X.] Other notes, patches, fixes, workarounds:
EOF

Attachment: pgpOnGm_6amgc.pgp
Description: PGP signature


Reply to: