[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: make bzImage

watchdog info below-

Peter Keel wrote:

> > fails on my Sparc10
> I've got exactly the same problem. My vmlinux won't let itself boot
> or whatever. It doesn't display "Loading Linux", instead something
> (dunno what this is) is displayed, and "Watchodg reset" or something.
> Any ideas? Why can't I make vmlinuz anyway?

What is a watchdog reset?

A watchdog reset is an unrecoverable situation that forces the CPU
to reset.  It is caused as a result of the machine trapping while
handling a trap with the "Enable Traps" bit in the Processor Status
Register (PSR) being disabled.  The reason traps have been disabled
is that no other traps should occur unit the first trap has been
handled.  But because a second trap has occurred and the cpu cannot
handle it the machine resets.

Are there any other reasons that a system would drop to the ok prompt?

There are several other reasons.  First, if the system receives a
break via the console (because Stop-A was typed or the keyboard was
unplugged and replugged on a regular console, or if a break was sent
from a tty console), it will halt and produce the ok prompt.  We
recommend that this be attempted on a hung (unresponsive) system.
A kernel feature known as a deadman timer can also be enabled in
an effort to diagnose a hung system.  If this is enabled, when the
system hangs it will be dropped to the ok prompt.

Is a watchdog reset the same as a system panic?

No.  On a system panic, the system saves the kernel context to the
system's swap disk, and then sets a flag indicating there is a crash
dump before it reboots.  If savecore is enabled, a crash dump is recovered
during reboot, or a manual savecore can be run shortly after the reboot.
On a watchdog reset, minimal information is saved, and then the system
simply halts.

What happens when a system gets a watchdog?

The behavior of the system after a watchdog is determined by the value
of the watchdog-reboot prom variable.  To see the value, from a running
system use the eeprom command.  From an ok prompt, use the env command.
The default value (here as output from the eeprom command) is false:
This value will cause the system to stay at the ok prompt after it happens.
If watchdog-reboot is set to true, the system will reboot automatically.
If a system is rebooting for no discernable reason, we advise checking the
value of this parameter and setting it to false if it is true.  If the
system had been experiencing watchdog resets, this will allow the collection
of useful data next time it happens.

Is the procedure for dealing with watchdogs the same for all Sun systems?

No.  Some of the commands will work on all systems, and others are
only relevent to certain architectures and configurations.  You
can determine the architecture of a running system by using the
command uname -a, and observing the fifth field returned, which
should be sun4, sun4d, sun4m, sun4u, etc.  You can determine if
a system is a multi-processor (MP) system by using the mpstat
command.  If it returns just one line, it is a single-cpu system.
Otherwise, it is a multi-processor system with a cpu represented
by each line of output.  An MP system will include in the prom prompt
an indication of what cpu experienced the halt, for example <#2>
which indicates cpu2.  Please write down the number, as it can be
helpful in identifying which cpu to replace if the cause is found
to be a defective one.

What commands should be typed from the ok prompt?

The commands are described below.  There is a feature called obpsym
which, when enabled, will allow certain of the commands to provide
symbolic information which will make interpretation by Sun Customer
Services easier (and probably faster).  If you do not know how to
enable this, ask someone at Sun to send you internal infodoc 15876.

Commands that work on all systems:

.registers   This displays the internal registers of the current cpu.
.locals      This displays the registers in the current register window.
ctrace       This displays the kernel stack.  If obpsym is enabled,
             the output includes useful symbolic information.  If not,
             it produces numbers which must be interpreted in conjunction
             with a crash dump.  This is the single most useful command.

System-specific commands:

.psr         Only available on systems supporting SPARC V8 architecture.
             If you're not certain, try it.  Prints the Processor
             Status Register in a readable format.
wd-dump      Only available on sun4d architecture.  Displays watchdog
             data including the program counter of the instruction that
             caused the crash.

Reply to: