Re: watchdog

To: Debian Developers <debian-devel@lists.debian.org>
Subject: Re: watchdog
From: Bastian Blywis <blywis@zedat.fu-berlin.de>
Date: Sun, 17 Apr 2011 14:27:55 +0200
Message-id: <[🔎] 201104171427.56793.blywis@zedat.fu-berlin.de>
Reply-to: blywis@inf.fu-berlin.de
In-reply-to: <[🔎] 20110417104843.GA14777@feivel.credativ.lan>
References: <[🔎] 201104141447.17091.blywis@zedat.fu-berlin.de> <[🔎] 201104141631.06427.blywis@zedat.fu-berlin.de> <[🔎] 20110417104843.GA14777@feivel.credativ.lan>

> > Does it actually perform some kind of checks? What I got from the

> Watchdog itself? Yes, which ones depends on your configuration.

> wd_keepalive only triggers the hardware watchdog.

No, I meant wd_keepalive and not watchdog.

> > documentation is that it only writes to /dev/watchdog periodically

> > regardless what happens. Thus "basic watchdog functionality" would only

> > mean that it is checked if the userspace process is still running.

> Yes, if it doesn't the hw watchdog will reset the system.

Unfortunately, as I mentioned, it seems that in some scenarios wd_keepalive will happily continue to write to /dev/watchdog and keep the system from rebooting although it should.

From my point of view this is not the desired behavior because the watchdog is started as desired by the system administrator, then stopped in rc0 and rc6, and thus the (desired) reboot prevented if something bad happens.

There are several solutions to this problem:

1) Add a parameter in /etc/default/watchdog, e.g., START_WD_KEEPALIVE (best and easiest solution)

2) Move wd_keepalive to a separate package and let the administrator decide if he/she wants wd_keepalive to be installed and started, when watchdog is stopped

3) Add a parameter to wd_keepalive so that it will only keep the system alive for a specific time. For example when in rc6, a timeout will trigger a hard reset even if this means that some services are not shut down properly. (most complex solution)

In the end it boils down to two opinions how the watchdog and system shall behave:

1) Value a proper shutdown higher than the chance to have an unavailable system

2) Have a system that has a high availability but accept that a hard reset might be triggered in rc0 and rc6

If there is consensus that opinion 1 (the current state) is ok, I can understand this and will not complain but a configuration option would be nice ;-)

Regards,

Bastian

Reply to:

References:
- watchdog
  - From: Bastian Blywis <blywis@zedat.fu-berlin.de>
- Re: watchdog
  - From: Bastian Blywis <blywis@zedat.fu-berlin.de>
- Re: watchdog
  - From: Michael Meskes <meskes@debian.org>

Prev by Date: Re: aircrack-ng missing!
Next by Date: Bug#623126: ITP: python-fluentxml -- Minimalist pythonic XML library
Previous by thread: Re: watchdog
Next by thread: RE: DPL 2011: Final call for votes
Index(es):
- Date
- Thread