Re: Netra T1 200 watchdog timeouts

On Sat, Sep 22, 2012 at 12:26:26PM +0100, Richard Mortimer wrote:
> On 19/09/2012 13:10, Mark Morgan Lloyd wrote:
> >Richard Mortimer wrote:
> >>On 18/09/2012 18:49, Mark Morgan Lloyd wrote:
> >>>Richard Mortimer wrote:
> ... snip ...
> >>>>>This affects both Lenny and Wheezy but does not affect Squeeze,
> >>>>>i.e. it
> >>>>>appears to be a regression. Since this happens in between the OBP boot
> >>>>>command and SILO's boot prompt, I presume that it is a SILO problem or
> >>>>>that the installer is doing something odd to the disklabel.
> >>>>>
> >>>>>Lenny:    1.4.13
> >>>>>Squeeze: 1.4.14
> >>>>>Wheezy:    1.4.14
> >>>>
> >>>>I don't see how the LOM firmware would affect this. OBP maybe but if
> >>>>it is a processor watchdog then it I doubt its LOM. SILO would be my
> >>>>first suspect.
> >>>
> >>>SILO is also my suspect (after a lot of fiddling trying to disable lom
> >>>watchdog from OBP etc.) and those are SILO version numbers :-/
> >>>
> >>Brain wasn't turned on enough to realise that!
> >>
> >> From memory I don't think the LOM watchdog is ever enabled in OBP on
> >>the T1 200. It only ever gets enabled by the device drivers once
> >>Solaris is running (if the packages you mention below are installed of
> >>course).
> >
> >OK but at the same time the README from Solaris patch 110208-21
> >explicitly says
> >
> >5043823  Patch 110208-18 changes watchdog behavior and causes watchdog
> >resets when probed
> >
> >and
> >
> >4412177  lomlite2 watchdog is not always disabled on "reboot" - 110208-07
> >
> >both of which read as though there could be spurious watchdog events
> >even without Solaris's intervention. However I note your point about the
> >LOM log not showing anything.
> I'm still pretty convinced that the problem you are seeing is
> nothing to do with LOM. I think that both of those are Solaris
> device driver issues too.
> >
> >Should I be raising this as a bug, or can I assume that the people who
> >need to know about it are already aware of the issue?
> Given that this affects Wheezy then a Debian bug is certainly in order.
> I haven't had time to track the development of Wheezy closely but I
> think that it is pretty much using upstream SILO. I vaguely remember
> a few changes upstream recently for both ext2/4 support and for cpu
> detection. One of those could be causing your problem on the Wheezy
> build.

Well, Mark mentioned that the same issue is encountered in both Wheezy 
and Squeeze SILO versions, which predates the recent ext2/4 changes.

And yes, there haven't been any Debian-specific changes to upstream 
SILO as of version 1.4.14+git20100228-1, uploaded in February 2010. 
Before that we had some Debian-specific patches included.

Mark, if you can try different SILO versions and find out which one 
introduced the regression, that would be great. As far as I can tell, 
releases shipped with the following versions:

Lenny  : 1.4.13a+git20070930-3
Squeeze: 1.4.14+git20100228-1+b1 

Assuming that the failure was introduced between 1.4.13a+git20070930-3 
(Lenny version) and 1.4.14+git20100228-1+b1 (Squeeze version), you 
just have one intermediate version (1.4.14+git20100207-1) to test.

> Given the nature of the problem I think it would be useful to have a
> good description of your installation in the bug. In particular
> filesystem layout (partition table), type (ext2/3/4) etc. may be
> relevant. A copy of the console session would be good to attach too.

Yep, the bug would be useful. Given that it's the first report like 
this that I see and that a simple enough workaround exists, I would 
don't think it qualifies as RC.

Best regards,
Jurij Smakov                                           jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC

