Richard Mortimer wrote:
On 18/09/2012 18:49, Mark Morgan Lloyd wrote:Good. I think that means that the LOM is definitely not involved in this problem.Richard Mortimer wrote:Hi Mark, On 18/09/2012 14:36, Mark Morgan Lloyd wrote:If I install either the current Wheezy/testing or Lenny on a Netra T1 200 with LOM (lomlite2) at 3.10, the first time OBP issues a boot command I get [string of hex] Watchdog Reset Externally Initiated ResetI have a feeling that this is not a LOM watchdog reset but more a SPARC processor watchdog reset (the processor running out of trap levels in memory fault/interrupt processing). You should be able to verify if it is a LOM watchdog reset by running the "loghistory" command at the lom prompt.No watchdog events shown, only power on/off and reset events (plus a 'LOM booted' near the start).The IEC connector isn't really relevant to this. The LOM controls the power to the main CPU/circuit board. Actually thinking about it I think a hard reset (typing reset at the LOM prompt), CPU watchdog reset and a power off/on will cause full (poweron) reset processing to occur.If I subsequently issue a second boot the system runs as expected.If I'm correct then this is probably due to something like retained memory (not cleared during a soft reset/reboot just cleared during a powercycle). That would explain why the second boot after the Watchdog/XIR works fine.But this also happens after a (soft) power-on, irrespective of whether power has been physically removed (i.e. IEC connector pulled out of back and left for a few minutes).But given that you said it happens after a (soft) power on then maybe it isn't relevant anyway.This affects both Lenny and Wheezy but does not affect Squeeze, i.e. it appears to be a regression. Since this happens in between the OBP boot command and SILO's boot prompt, I presume that it is a SILO problem or that the installer is doing something odd to the disklabel. Lenny: 1.4.13 Squeeze: 1.4.14 Wheezy: 1.4.14I don't see how the LOM firmware would affect this. OBP maybe but if it is a processor watchdog then it I doubt its LOM. SILO would be my first suspect.SILO is also my suspect (after a lot of fiddling trying to disable lom watchdog from OBP etc.) and those are SILO version numbers :-/Brain wasn't turned on enough to realise that!From memory I don't think the LOM watchdog is ever enabled in OBP on the T1 200. It only ever gets enabled by the device drivers once Solaris is running (if the packages you mention below are installed of course).
OK but at the same time the README from Solaris patch 110208-21 explicitly says
5043823 Patch 110208-18 changes watchdog behavior and causes watchdog resets when probed
and 4412177 lomlite2 watchdog is not always disabled on "reboot" - 110208-07both of which read as though there could be spurious watchdog events even without Solaris's intervention. However I note your point about the LOM log not showing anything.
Should I be raising this as a bug, or can I assume that the people who need to know about it are already aware of the issue?
The correct way of fixing this is probably to upgrade the LOM firmware to 3.14. However this requires Solaris, and before the patch can be installed it requires that the appropriate packages be installed: "To use LOM commands you must install the Lights Out Management 2.0 packages (SUNWlomu, SUNWlomr and SUNWlomm) from the Solaris Supplementary CD." http://docs.oracle.com/cd/E19102-01/n1280.srvr/819-1269-11/poweron.html The problem is that I don't believe that the supplementary CD is freely available, which in practice means that this course is not available to most Linux users.I'm hoping there's enough detail in there that it shows up on Google, it might save people work in the future.
-- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues]