Re: Netra T1 200 watchdog timeouts
Jurij Smakov wrote:
On Sun, Sep 23, 2012 at 02:07:46PM +0000, Mark Morgan Lloyd wrote:
It went in as 688521 at about the same time as you posted. Pity I
didn't hold off for another hour or so.
Thanks, I'll bcc this response to the bug, let's continue discussion
there.
OK, but a couple of slightly more verbose comments here.
Looking at the output you see, I have doubts that it has anything to
do with SILO though. SILO prints letters 'S', 'I', 'L' and 'O'
(appearing before the prompt) after it completes execution of
different parts of first-stage loader. As you can see in the code
(first/first.S), printing 'S' is the first thing first-stage loader
does upon startup. The fact that it is not seen in the console output
suggests that even first-stage loader never got to run. The line
Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a File and args:
which is normally printed by OBP before control is passed to SILO does
not appear in the watchdog-reset case either, which, again, is a
strong sign that failure happens before SILO has a chance to run.
In a failure case, how long does it take between you typing 'boot' and
"watchdog reset" message being displayed? This doc
About a second.
http://docs.oracle.com/cd/E19102-01/n240.srvr/817-5481-11/understanding_wdtimer.html
appears to suggest that stuck watchdog would initiate a XIR after 60
seconds by default, is it consistent with what you see? What are the
values of various variables mentioned there on your system(s)? Does
increasing the timeout help?
As far as I can see, that document refers to either ALOM or Solaris
parameters. There's quite a terminology program: the Netra T1 200 has a
port labeled "A LOM" above another labeled "B SERIAL" but from what I
can see that's /not/ a Sun ALOM port: it goes to a lomlite2 chip which
is something different.
Also there are some things that might be relevant which can only by done
by Solaris's lom command which isn't available unless you install a
not-freely-available package (it needs a device driver, unlike some of
the RSC support on e.g. a 280R which doesn't).
I really can't come up with any reason why it would work for Squeeze
but not other releases, so testing all suspect SILO versions on the
same machine would be an interesting experiment.
This is something I've not had to do before- Debian usually "just
works" or I have to go upstream if I want something bleeding-edge.
Is this syntax right and in view of the message what should I have
in sources.list etc?
root@firewall3:/home/markMLl# apt-get install silo=1.4.14+git20100228-1+b1
..
E: Version '1.4.14+git20100228-1+b1' for 'silo' was not found
That only works when you have repositories containing older/newer
packages listed in your /etc/apt/source.list. Simply adding them
(without configuring apt pinning appropriately) may mess up too many
things, so the simplest way is probably to just download older SILO
debs (should be available on archive.debian.org) and install them
using dpkg -i.
I can't find a binary for 1.4.14+git20100207-1 that you wanted me to
test. I can see versions as below so I'll start working through them.
http://ftp.uk.debian.org/debian/pool/main/s/silo/silo_1.4.14+git20120819-1_sparc.deb
http://ftp.uk.debian.org/debian/pool/main/s/silo/silo_1.4.14+git20100228-1+b1_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.4.13a+git20070930-3_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.4.13-1_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.4.9-1_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.2.5-2_sparc.deb
I'm on it, unless we get more than the usual number of Monday-morning
blowups.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]
Reply to: